GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 5 days ago • 90
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items Paper • 2604.19748 • Published 13 days ago • 249
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 19 days ago • 117
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 19 days ago • 155
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details Paper • 2604.06870 • Published 26 days ago • 41
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 26 days ago • 187
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 28 days ago • 203
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published Mar 30 • 85
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published Mar 31 • 48
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published Mar 31 • 46
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published Mar 27 • 65
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization Paper • 2603.28342 • Published Mar 30 • 26
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 156