A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Paper
• 2102.06356 • Published
Note Optimizer-Google
Note Optimizer-lamb
Note Optimizer-adamw https://arxiv.org/abs/2410.05192 Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective Stanford. WSD to LR