bargein-classifier

Open-source barge-in detection for voice agents. Classifies whether a user is interrupting (barge-in) or just backchanneling ("uh-huh", "yeah") during an agent's turn.

Designed as a self-hosted alternative to proprietary adaptive interruption handling.

Overview


Architecture	2D CNN on log-mel spectrograms
Model size	373 KB (ONNX)
Input	2s window, 16 kHz mono PCM
Inference	~5 ms CPU (ONNX Runtime)
Streaming	100 ms hop, sliding window
Training data	AMI + ICSI meeting corpora (CC BY 4.0)

How It Works

Place downstream of VAD in a voice pipeline. When the user speaks while the agent is talking:

VAD detects speech overlap
This model scores the user's audio
High probability = barge-in, agent should yield
Low probability = backchannel or noise, agent keeps speaking

The model expects the user's audio in isolation (after echo cancellation), not a mixed signal.

Performance

Detection accuracy

Metric	Value
PR-AUC	0.912
Precision @ 95% recall	0.772
Majority baseline	0.709

Hard negative rejection

Sound	False positive rate
Laugh	13%
Cough	22%
Throat clear	18%
Breath-laugh	8%

Cross-corpus generalization (AMI-only variant)

Eval set	PR-AUC	Precision @ 95% recall
AMI (same-corpus)	0.972	0.909
ICSI (cross-corpus)	0.979	0.894

Quickstart

ONNX Runtime

import numpy as np
import onnxruntime as ort

session = ort.InferenceSession("bargein.onnx", providers=["CPUExecutionProvider"])
meta = np.load("bargein.onnx.meta.npz", allow_pickle=True)
threshold = float(meta["threshold"][0])

# features: (1, 1, 64, 200) log-mel spectrogram from 2s of 16kHz audio
logits = session.run(None, {"input": features})[0]
prob = 1.0 / (1.0 + np.exp(-logits[0]))
is_bargein = prob >= threshold

HTTP Server

BARGEIN_MODEL_PATH=bargein.onnx uvicorn server.app:app --host 0.0.0.0 --port 8080

curl -X POST http://localhost:8080/bargein --data-binary @audio.raw
# {"is_bargein": true, "probability": 0.87, "threshold": 0.339, "prediction_duration_ms": 4.8}

Deployment

Requirement	Specification
Runtime	CPU-only (ONNX Runtime)
RAM	< 100 MB
GPU	Not required
Dependencies	`onnxruntime`, `numpy`

Training Data

~55K labeled events from two meeting corpora:

AMI (138 meetings, ~29K events) — dialog-act annotations as weak supervision
ICSI (75 meetings, ~26K events) — MRDA dialog-act tags
Hard negatives (~5K) — laugh, cough, throat clear, breath sounds

Audio source: individual per-speaker headset channels, analogous to a user's microphone with echo cancellation.

Labels are weak supervision from dialog-act ontologies, not human-audited barge-in judgments.

Limitations

English only — trained on English meeting corpora
Domain gap — trained on meeting audio, not voice agent audio. Deploy in shadow mode first to validate on your traffic.
Single-speaker input — expects isolated user audio (with echo cancellation). Performance degrades on mixed/summed channels.
Weak labels — there is a label noise ceiling from the dialog-act proxy. Human-audited fine-tuning would improve quality.

Training Configuration

Parameter	Value
Framework	PyTorch, exported to ONNX
Optimizer	Adam, lr=0.003
Batch size	32
Epochs	20 (early stopping, patience=5)
Features	64-band log-mel, 512 FFT, 160 hop
Threshold	Recall-constrained sweep (recall >= 95%)

Author

Borislav Novikov (bnovkov012@gmail.com)

Citation

@software{novikov2026bargein,
  title={bargein-classifier: Open-source barge-in detection for voice agents},
  author={Novikov, Borislav},
  year={2026},
  url={https://huggingface.co/bnovikov/bargein-classifier}
}

Training data

@article{carletta2005ami,
  title={The AMI meeting corpus: A pre-announcement},
  author={Carletta, Jean},
  journal={Machine Learning for Multimodal Interaction},
  year={2005}
}

@inproceedings{janin2003icsi,
  title={The ICSI meeting corpus},
  author={Janin, Adam and Baron, Don and Edwards, Jane and Ellis, Dan and Gelbart, David and Morgan, Nelson and Peskin, Barbara and Pfau, Thilo and Shriberg, Elizabeth and Stolcke, Andreas and Wooters, Chuck},
  booktitle={ICASSP},
  year={2003}
}

License

Model weights: CC BY 4.0 (same as training data).

Downloads last month: -; Downloads are not tracked for this model. How to track