view article Article Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step FINAL-Bench • 7 days ago • 18
view article Article EMO: Pretraining mixture of experts for emergent modularity allenai • 14 days ago • 37
view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 9 days ago • 53
view post Post 4836 📣 Add architecture visualization to model card!🌟 For all creators out there: add a model visualization to your model card to capture your audience's attention!🖱️ When clicked, it opens an interactive view with multiple levels of granularity!1️⃣ Paste url at https://hfviewer.com/model-card-embed2️⃣ Paste generated code in your README.md!3️⃣ ✨ See translation 🚀 12 12 ❤️ 10 10 🔥 4 4 😎 2 2 + Reply
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts kashif, stas • Mar 9 • 28
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k
view article Article DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background NormalUhr • Feb 28, 2025 • 19
view article Article Improving Prompt Consistency with Structured Generations +1 willkurt, remi, clefourrier • Apr 30, 2024 • 68
Running Featured 93 Music Arena Leaderboard 🎵 93 AI Music Arena & Leaderboard (Suno, Udio, Google, Meta, +)