SPARK-Code · Three-Adapter Demo

Interactive demo of three LoRA adapters for Qwen2.5-Coder-3B-Instruct trained on MBPP with execution-grounded GRPO, evaluated on HumanEval and a held-out MBPP slice.

A (Exec-only GRPO) — model card — strongest baseline; +0.85 pp HumanEval pass@1 with bounded KL.
C-light (Naive Co-Evolve) — model card — demonstrates the policy-drift failure mode (−2.3 pp on HumanEval).
C-reg (Regularized Co-Evolve) — model card — bounded drift; matches the baseline on HumanEval and gains +4 pp on MBPP pass@5.

Key finding: C-light demonstrates policy drift; C-reg recovers via lower aux_loss_scale and higher kl_coeff.

Source code: https://github.com/amarsaikhanb/spark-code

ZeroGPU cold start is ~30s on the first request after idle.

Condition

Temperature

0 1.5

Max new tokens

64 1024

Prompt

Test cases (optional, Python asserts)

Examples

Prompt	Test cases (optional, Python asserts)

Generated code

SPARK-Code · Three-Adapter Demo

A (Exec-only GRPO)

C-light (Naive Co-Evolve)

C-reg (Regularized Co-Evolve)

Base (no adapter)