VST-7B

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code | ๐Ÿค— Training Data

This is the 7B variant of Video Streaming Thinking (VST), a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.

Performance

Model OVO-Bench StreamingBench VideoMME LongVideoBench VideoHolmes
VST-7B 59.3 79.5 64.9 58.0 41.9

Citation

@article{guan2026videostreamingthinking,
      title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
      author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
      journal={arXiv preprint arXiv:2603.12262},
      year={2026},
}
Downloads last month
38
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Catalan258/VST-7B

Finetuned
(1086)
this model

Paper for Catalan258/VST-7B