VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval Paper • 2602.08099 • Published 21 days ago • 121
Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention Paper • 2602.01801 • Published 28 days ago • 28
Alterbute: Editing Intrinsic Attributes of Objects in Images Paper • 2601.10714 • Published Jan 15 • 31
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper • 2508.09983 • Published Aug 13, 2025 • 70
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Paper • 2506.08570 • Published Jun 10, 2025 • 33
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Paper • 2505.17813 • Published May 23, 2025 • 58
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published Apr 3, 2025 • 31
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19, 2025 • 69
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13, 2025 • 36
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13, 2025 • 36
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13, 2025 • 36 • 2
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation Paper • 2412.08645 • Published Dec 11, 2024 • 12
Hidden in the Noise: Two-Stage Robust Watermarking for Images Paper • 2412.04653 • Published Dec 5, 2024 • 30