ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published Mar 19 • 14
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 111
view article Article Fine-tuning Llama 2 70B using PyTorch FSDP +2 smangrul, sgugger, lewtun, philschmid • Sep 13, 2023 • 32