leondawn666 's Collections Agent & RL
updated
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published
• 31
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published
• 139
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published
• 88
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper
• 2504.16656
• Published
• 57
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
• 2504.20571
• Published
• 98
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published
• 80
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published
• 306
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published
• 303
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
• 2505.17612
• Published
• 81
ARM: Adaptive Reasoning Model
Paper
• 2505.20258
• Published
• 45
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
• 2502.12115
• Published
• 46
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
Paper
• 2502.07316
• Published
• 50
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
• 2507.15061
• Published
• 60
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published
• 159
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
• 2507.21046
• Published
• 84
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 317
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
• 2508.01191
• Published
• 238
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
• 2508.05629
• Published
• 183
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from
Experience
Paper
• 2508.04700
• Published
• 52
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published
• 58
SSRL: Self-Search Reinforcement Learning
Paper
• 2508.10874
• Published
• 97
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
• 2508.13167
• Published
• 129
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published
• 662
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
• 2509.25454
• Published
• 146
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 230
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
• 2510.07242
• Published
• 30
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published
• 31
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
• 2510.04618
• Published
• 129
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published
• 44
It Takes Two: Your GRPO Is Secretly DPO
Paper
• 2510.00977
• Published
• 32
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published
• 22
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published
• 32
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Paper
• 2511.15593
• Published
• 58
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Paper
• 2511.11793
• Published
• 187
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
• 2511.13612
• Published
• 134
LightRAG: Simple and Fast Retrieval-Augmented Generation
Paper
• 2410.05779
• Published
• 28
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists
Paper
• 2511.16931
• Published
• 8
Budget-Aware Tool-Use Enables Effective Agent Scaling
Paper
• 2511.17006
• Published
• 32