Bugai's Collection - a BugaiL Collection

BugaiL 's Collections

Bugai's Collection

Bugai's Collection

updated Nov 11, 2025

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 143
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

Paper • 2508.18966 • Published Aug 26, 2025 • 56
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 231
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31, 2025 • 85
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1, 2025 • 79
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

Paper • 2509.00605 • Published Aug 30, 2025 • 43
Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30, 2025 • 72
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks

Paper • 2509.01396 • Published Sep 1, 2025 • 58
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published Oct 14, 2025 • 147
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 136
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Paper • 2510.25976 • Published Oct 29, 2025 • 16
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Paper • 2510.25616 • Published Oct 29, 2025 • 105
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4, 2025 • 102
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 59
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 240
V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 97
Scaling Agent Learning via Experience Synthesis

Paper • 2511.03773 • Published Nov 5, 2025 • 82
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6, 2025 • 17
HaluMem: Evaluating Hallucinations in Memory Systems of Agents

Paper • 2511.03506 • Published Nov 5, 2025 • 94
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Paper • 2511.07327 • Published Nov 10, 2025 • 78
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

Paper • 2511.06411 • Published Nov 9, 2025 • 18