What is the command used to evaluate on MMLU?

by PY007 - opened Feb 24, 2024

Feb 24, 2024

Thanks for open-sourcing the model and dataset and congrat on the release!

May I ask which command is used to evaluate on MMLU ?

I tried

accelerate launch --num_processes 8 -m lm_eval  --model_args pretrained=HuggingFaceTB/cosmo-1b,dtype=bfloat16,use_flash_attention_2=True \
        --tasks mmlu --num_fewshot 5\
        --batch_size 16

and get the following results:

Groups	Version	Filter	n-shot	Metric	Value		Stderr
mmlu	N/A	none	0	acc	0.2608	±	0.0397
- humanities	N/A	none	5	acc	0.2544	±	0.0289
- other	N/A	none	5	acc	0.2671	±	0.0414
- social_sciences	N/A	none	5	acc	0.2548	±	0.0401
- stem	N/A	none	5	acc	0.2699	±	0.0491

PY007

Feb 24, 2024

Scores on OpenLLM leaderboard:

loubnabnl

Hugging Face Smol Models Research org Mar 4, 2024

•

edited Mar 4, 2024

Thanks for pointing it out, the model was evaluated before we converted it form our training framework to transformers maybe something went wrong, we'll run some tests.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment