NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 62
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published Jan 21 • 74
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published Jan 22 • 4