VST-7B

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

📄 Paper | 🌐 Project Page | 💻 Code | 🤗 Training Data

This is the 7B variant of Video Streaming Thinking (VST), a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.

Performance

Model	OVO-Bench	StreamingBench	VideoMME	LongVideoBench	VideoHolmes
VST-7B	59.3	79.5	64.9	58.0	41.9

Citation

@article{guan2026videostreamingthinking,
      title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
      author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
      journal={arXiv preprint arXiv:2603.12262},
      year={2026},
}