DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Paper β’ 2505.04410 β’ Published May 7, 2025 β’ 44
Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception Paper β’ 2508.11256 β’ Published Aug 15, 2025
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper β’ 2602.08024 β’ Published Feb 8 β’ 2
Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior Paper β’ 2512.06866 β’ Published Dec 7, 2025 β’ 5
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision Paper β’ 2405.17913 β’ Published May 28, 2024
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper β’ 2602.08024 β’ Published Feb 8 β’ 2
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging Paper β’ 2602.08024 β’ Published Feb 8 β’ 2
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper β’ 2601.06487 β’ Published Jan 10 β’ 53
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents Paper β’ 2505.23923 β’ Published May 29, 2025 β’ 8
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction Paper β’ 2505.20277 β’ Published May 26, 2025
Improving Transformer World Models for Data-Efficient RL Paper β’ 2502.01591 β’ Published Feb 3, 2025 β’ 10
OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Paper β’ 2501.04561 β’ Published Jan 8, 2025 β’ 16
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper β’ 2409.05840 β’ Published Sep 9, 2024 β’ 49
Text-Video Retrieval with Global-Local Semantic Consistent Learning Paper β’ 2405.12710 β’ Published May 21, 2024