bargein-classifier

Open-source barge-in detection for voice agents. Classifies whether a user is interrupting (barge-in) or just backchanneling ("uh-huh", "yeah") during an agent's turn.

Designed as a self-hosted alternative to proprietary adaptive interruption handling.

Overview

Architecture 2D CNN on log-mel spectrograms
Model size 373 KB (ONNX)
Input 2s window, 16 kHz mono PCM
Inference ~5 ms CPU (ONNX Runtime)
Streaming 100 ms hop, sliding window
Training data AMI + ICSI meeting corpora (CC BY 4.0)

How It Works

Place downstream of VAD in a voice pipeline. When the user speaks while the agent is talking:

  1. VAD detects speech overlap
  2. This model scores the user's audio
  3. High probability = barge-in, agent should yield
  4. Low probability = backchannel or noise, agent keeps speaking

The model expects the user's audio in isolation (after echo cancellation), not a mixed signal.

Performance

Detection accuracy

Metric Value
PR-AUC 0.912
Precision @ 95% recall 0.772
Majority baseline 0.709

Hard negative rejection

Sound False positive rate
Laugh 13%
Cough 22%
Throat clear 18%
Breath-laugh 8%

Cross-corpus generalization (AMI-only variant)

Eval set PR-AUC Precision @ 95% recall
AMI (same-corpus) 0.972 0.909
ICSI (cross-corpus) 0.979 0.894

Quickstart

ONNX Runtime

import numpy as np
import onnxruntime as ort

session = ort.InferenceSession("bargein.onnx", providers=["CPUExecutionProvider"])
meta = np.load("bargein.onnx.meta.npz", allow_pickle=True)
threshold = float(meta["threshold"][0])

# features: (1, 1, 64, 200) log-mel spectrogram from 2s of 16kHz audio
logits = session.run(None, {"input": features})[0]
prob = 1.0 / (1.0 + np.exp(-logits[0]))
is_bargein = prob >= threshold

HTTP Server

BARGEIN_MODEL_PATH=bargein.onnx uvicorn server.app:app --host 0.0.0.0 --port 8080
curl -X POST http://localhost:8080/bargein --data-binary @audio.raw
# {"is_bargein": true, "probability": 0.87, "threshold": 0.339, "prediction_duration_ms": 4.8}

Deployment

Requirement Specification
Runtime CPU-only (ONNX Runtime)
RAM < 100 MB
GPU Not required
Dependencies onnxruntime, numpy

Training Data

~55K labeled events from two meeting corpora:

  • AMI (138 meetings, ~29K events) β€” dialog-act annotations as weak supervision
  • ICSI (75 meetings, ~26K events) β€” MRDA dialog-act tags
  • Hard negatives (~5K) β€” laugh, cough, throat clear, breath sounds

Audio source: individual per-speaker headset channels, analogous to a user's microphone with echo cancellation.

Labels are weak supervision from dialog-act ontologies, not human-audited barge-in judgments.

Limitations

  • English only β€” trained on English meeting corpora
  • Domain gap β€” trained on meeting audio, not voice agent audio. Deploy in shadow mode first to validate on your traffic.
  • Single-speaker input β€” expects isolated user audio (with echo cancellation). Performance degrades on mixed/summed channels.
  • Weak labels β€” there is a label noise ceiling from the dialog-act proxy. Human-audited fine-tuning would improve quality.

Training Configuration

Parameter Value
Framework PyTorch, exported to ONNX
Optimizer Adam, lr=0.003
Batch size 32
Epochs 20 (early stopping, patience=5)
Features 64-band log-mel, 512 FFT, 160 hop
Threshold Recall-constrained sweep (recall >= 95%)

Author

Borislav Novikov (bnovkov012@gmail.com)

Citation

@software{novikov2026bargein,
  title={bargein-classifier: Open-source barge-in detection for voice agents},
  author={Novikov, Borislav},
  year={2026},
  url={https://huggingface.co/bnovikov/bargein-classifier}
}

Training data

@article{carletta2005ami,
  title={The AMI meeting corpus: A pre-announcement},
  author={Carletta, Jean},
  journal={Machine Learning for Multimodal Interaction},
  year={2005}
}

@inproceedings{janin2003icsi,
  title={The ICSI meeting corpus},
  author={Janin, Adam and Baron, Don and Edwards, Jane and Ellis, Dan and Gelbart, David and Morgan, Nelson and Peskin, Barbara and Pfau, Thilo and Shriberg, Elizabeth and Stolcke, Andreas and Wooters, Chuck},
  booktitle={ICASSP},
  year={2003}
}

License

Model weights: CC BY 4.0 (same as training data).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support