Deep Think
updated
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published • 87
BM25S: Orders of magnitude faster lexical search via eager sparse
scoring
Paper
• 2407.03618
• Published • 14
Deep Think with Confidence
Paper
• 2508.15260
• Published • 90
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
• 2508.05004
• Published • 131
Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task
RL with Hybrid Rewards
Paper
• 2507.14783
• Published • 4
GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement
Learning
Paper
• 2507.10628
• Published • 2
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
Why Language Models Hallucinate
Paper
• 2509.04664
• Published • 199
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published • 151
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published • 32
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper
• 2509.03646
• Published • 33
Staying in the Sweet Spot: Responsive Reasoning Evolution via
Capability-Adaptive Hint Scaffolding
Paper
• 2509.06923
• Published • 22
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published • 664
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
Real Instructions?
Paper
• 2509.04292
• Published • 58
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM
Fine-Tuning Data from Unstructured Documents
Paper
• 2507.04009
• Published • 54
Scaling Agents via Continual Pre-training
Paper
• 2509.13310
• Published • 117
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper
• 2509.09995
• Published • 16
ReSum: Unlocking Long-Horizon Search Intelligence via Context
Summarization
Paper
• 2509.13313
• Published • 80
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model
Reasoning
Paper
• 2509.19894
• Published • 34
When Does Reasoning Matter? A Controlled Study of Reasoning's
Contribution to Model Performance
Paper
• 2509.22193
• Published • 38