Agent & RL - a leondawn666 Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

leondawn666 's Collections

Agent & RL

updated Nov 27, 2025

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 144
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88
ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16, 2025 • 49
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Paper • 2504.16656 • Published Apr 23, 2025 • 57
Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 98
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85
Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published Apr 5, 2025 • 80
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 303
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31, 2025 • 62
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23, 2025 • 81
ARM: Adaptive Reasoning Model

Paper • 2505.20258 • Published May 26, 2025 • 45
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 189
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28, 2025 • 131
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

Paper • 2502.12115 • Published Feb 17, 2025 • 46
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 277
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 152
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 124
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11, 2025 • 50
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20, 2025 • 60
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 261
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 159
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 158
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 84
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 317
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 238
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 183
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6, 2025 • 52
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 58
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14, 2025 • 97
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 662
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 146
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 509
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 230
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 129
RLP: Reinforcement as a Pretraining Objective

Paper • 2510.01265 • Published Sep 26, 2025 • 44
It Takes Two: Your GRPO Is Secretly DPO

Paper • 2510.00977 • Published Oct 1, 2025 • 32
DCPO: Dynamic Clipping Policy Optimization

Paper • 2509.02333 • Published Sep 2, 2025 • 22
The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 32
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Paper • 2511.15593 • Published Nov 19, 2025 • 58
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Paper • 2511.11793 • Published Nov 14, 2025 • 187
P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134
LightRAG: Simple and Fast Retrieval-Augmented Generation

Paper • 2410.05779 • Published Oct 8, 2024 • 28
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists

Paper • 2511.16931 • Published Nov 21, 2025 • 8
Budget-Aware Tool-Use Enables Effective Agent Scaling

Paper • 2511.17006 • Published Nov 21, 2025 • 32

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs