Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published 9 days ago • 30
Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published 9 days ago • 30
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 107
TaH Collection Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models • 9 items • Updated Apr 12 • 2
TaH Collection Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models • 9 items • Updated Apr 12 • 2
TaH Collection Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models • 9 items • Updated Apr 12 • 2
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 396