Communicating about Space: Language-Mediated Spatial Integration Across Partial Views Paper • 2603.27183 • Published 21 days ago • 20
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper • 2508.16763 • Published Aug 22, 2025 • 2
The Promise of RL for Autoregressive Image Editing Paper • 2508.01119 • Published Aug 1, 2025 • 11
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper • 2505.20793 • Published May 27, 2025 • 13
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Paper • 2505.20046 • Published May 26, 2025 • 18
SafeArena: Evaluating the Safety of Autonomous Web Agents Paper • 2503.04957 • Published Mar 6, 2025 • 21
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 13
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3, 2024 • 30
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8, 2024 • 39