SPARK-Code · Three-Adapter Demo

Interactive demo of three LoRA adapters for Qwen2.5-Coder-3B-Instruct trained on MBPP with execution-grounded GRPO, evaluated on HumanEval and a held-out MBPP slice.

  • A (Exec-only GRPO)model card — strongest baseline; +0.85 pp HumanEval pass@1 with bounded KL.
  • C-light (Naive Co-Evolve)model card — demonstrates the policy-drift failure mode (−2.3 pp on HumanEval).
  • C-reg (Regularized Co-Evolve)model card — bounded drift; matches the baseline on HumanEval and gains +4 pp on MBPP pass@5.

Key finding: C-light demonstrates policy drift; C-reg recovers via lower aux_loss_scale and higher kl_coeff.

Source code: https://github.com/amarsaikhanb/spark-code

ZeroGPU cold start is ~30s on the first request after idle.

Condition
0 1.5
64 1024
Examples
Prompt Test cases (optional, Python asserts)