Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
authored
a paper
8 days ago
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation authored
a paper
8 days ago
Learning to Focus: Causal Attention Distillation via Gradient-Guided
Token Pruning updated
a dataset 8 days ago
Keven16/G-OPD-Training-Data Organizations
None yet