TsinghuaNLP/EVIL
Viewer • Updated • 5.75k • 26
None defined yet.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe