Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper โข 2603.12262 โข Published โข 31
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
๐ Paper | ๐ Project Page | ๐ป Code | ๐ค Training Data
This is the 7B variant of Video Streaming Thinking (VST), a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.
| Model | OVO-Bench | StreamingBench | VideoMME | LongVideoBench | VideoHolmes |
|---|---|---|---|---|---|
| VST-7B | 59.3 | 79.5 | 64.9 | 58.0 | 41.9 |
@article{guan2026videostreamingthinking,
title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
journal={arXiv preprint arXiv:2603.12262},
year={2026},
}
Base model
Qwen/Qwen2.5-VL-7B-Instruct