Bugai's Collection
updated
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
• 2508.20751
• Published
• 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published
• 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
• 2508.19247
• Published
• 43
VibeVoice Technical Report
Paper
• 2508.19205
• Published
• 143
USO: Unified Style and Subject-Driven Generation via Disentangled and
Reward Learning
Paper
• 2508.18966
• Published
• 56
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 231
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
• 2509.02479
• Published
• 84
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
• 2509.00676
• Published
• 85
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
• 2509.01055
• Published
• 79
Gated Associative Memory: A Parallel O(N) Architecture for Efficient
Sequence Modeling
Paper
• 2509.00605
• Published
• 43
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published
• 72
DeepResearch Arena: The First Exam of LLMs' Research Abilities via
Seminar-Grounded Tasks
Paper
• 2509.01396
• Published
• 58
Spatial Forcing: Implicit Spatial Representation Alignment for
Vision-language-action Model
Paper
• 2510.12276
• Published
• 147
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published
• 136
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction
Transformer
Paper
• 2510.25976
• Published
• 16
Don't Blind Your VLA: Aligning Visual Representations for OOD
Generalization
Paper
• 2510.25616
• Published
• 105
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual
Representation
Paper
• 2511.02778
• Published
• 102
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought
Paper
• 2511.02779
• Published
• 59
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Paper
• 2511.04570
• Published
• 240
V-Thinker: Interactive Thinking with Images
Paper
• 2511.04460
• Published
• 97
Scaling Agent Learning via Experience Synthesis
Paper
• 2511.03773
• Published
• 82
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
Paper
• 2511.04217
• Published
• 17
HaluMem: Evaluating Hallucinations in Memory Systems of Agents
Paper
• 2511.03506
• Published
• 94
IterResearch: Rethinking Long-Horizon Agents via Markovian State
Reconstruction
Paper
• 2511.07327
• Published
• 78
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via
Gumbel-Reparameterized Soft-Thinking Policy Optimization
Paper
• 2511.06411
• Published
• 18