Instructions to use mpasila/yi-super-9B-exl2-4bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mpasila/yi-super-9B-exl2-4bpw with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mpasila/yi-super-9B-exl2-4bpw")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mpasila/yi-super-9B-exl2-4bpw")
model = AutoModelForCausalLM.from_pretrained("mpasila/yi-super-9B-exl2-4bpw")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mpasila/yi-super-9B-exl2-4bpw with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mpasila/yi-super-9B-exl2-4bpw"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mpasila/yi-super-9B-exl2-4bpw",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/mpasila/yi-super-9B-exl2-4bpw

SGLang

How to use mpasila/yi-super-9B-exl2-4bpw with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mpasila/yi-super-9B-exl2-4bpw" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mpasila/yi-super-9B-exl2-4bpw",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mpasila/yi-super-9B-exl2-4bpw" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mpasila/yi-super-9B-exl2-4bpw",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use mpasila/yi-super-9B-exl2-4bpw with Docker Model Runner:
```
docker model run hf.co/mpasila/yi-super-9B-exl2-4bpw
```

This is an ExLlamaV2 quantized model in 4bpw of feeltheAGI/yi-super-9B using the default calibration dataset.

Original Model card:

YI-9B-Super

YI-9B-Super is an YI-9B model that has been further fine-tuned with OpenHermes-2.5 dataset.

Results on some benchmarks :

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
truthfulqa	N/A	none	0	rouge1_max	47.1011	±	0.8016
hellaswag	1	none	None	acc	0.5758	±	0.0049
		none	None	acc_norm	0.7639	±	0.0042
gsm8k_cot	3	strict-match	8	exact_match	0.5262	±	0.0138
		flexible-extract	8	exact_match	0.6027	±	0.0135
gsm8k	3	strict-match	5	exact_match	0.6073	±	0.0135
		flexible-extract	5	exact_match	0.6126	±	0.0134

Groups	Version	Filter	n-shot	Metric	Value		Stderr
truthfulqa	N/A	none	0	rouge1_max	47.1011	±	0.8016
		none	0	bleu_max	21.9476	±	0.7162
		none	0	rouge2_acc	0.3293	±	0.0165
		none	0	bleu_acc	0.3635	±	0.0168
		none	0	rouge1_acc	0.3892	±	0.0171
		none	0	rougeL_acc	0.3782	±	0.0170
		none	0	bleu_diff	-2.3953	±	0.6292
		none	0	rouge2_diff	-4.6929	±	0.9130
		none	0	rougeL_diff	-4.2677	±	0.8034
		none	0	acc	0.4040	±	0.0113
		none	0	rouge1_diff	-3.8975	±	0.7966
		none	0	rougeL_max	43.7954	±	0.8145
		none	0	rouge2_max	32.3573	±	0.9094
mmlu	N/A	none	0	acc	0.6726	±	0.0037
- humanities	N/A	none	None	acc	0.6043	±	0.0067
- other	N/A	none	None	acc	0.7306	±	0.0077
- social_sciences	N/A	none	None	acc	0.7741	±	0.0074
- stem	N/A	none	None	acc	0.6181	±	0.0083

Downloads last month: 1

Dataset used to train mpasila/yi-super-9B-exl2-4bpw

Collection including mpasila/yi-super-9B-exl2-4bpw

ExLlamaV2 quantizations

Collection

All my EXL2 quants here. • 32 items • Updated May 18, 2024