Instructions to use MiniMaxAI/MiniMax-M2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MiniMaxAI/MiniMax-M2.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M2.1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MiniMaxAI/MiniMax-M2.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MiniMaxAI/MiniMax-M2.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MiniMaxAI/MiniMax-M2.1

SGLang

How to use MiniMaxAI/MiniMax-M2.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MiniMaxAI/MiniMax-M2.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MiniMaxAI/MiniMax-M2.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MiniMaxAI/MiniMax-M2.1 with Docker Model Runner:
```
docker model run hf.co/MiniMaxAI/MiniMax-M2.1
```

NVFP4?

by ktsaou - opened Dec 26, 2025

Discussion

ktsaou

Dec 26, 2025

•

edited Dec 27, 2025

@Firworks it would be amazing if your could convert this model to NVFP4 !

btw, I run minimax-m2 on 2x nvidia rtx 6000 pro blackwell, and it is the extremely reliable and performant. Minimax-M2 is gold for rtx blackwell. I hope this one will be too.

Firworks

Dec 28, 2025

I've spent some time today attempting it but I think I'll have to do a few monkey patches to llm-compressor to get it to run to completion. Hopefully I can get it run. I spent a while trying to get the original M2 run as well but stopped when someone else successfully published an NVFP4 quant of it. The M2 NVFP4 quant was done with ModelOpt so using a different process than I normally run.

xhybr1d

Dec 28, 2025

Also interrested in a nvfp4 version, have tried building one on the dgx spark but couldn't manage to do it due to some issues.

ktsaou

Dec 28, 2025

@Firworks the one I am currently using is this https://huggingface.co/lukealonso/MiniMax-M2-NVFP4

It works amazingly well on 2x rtx 6000 pro blackwell - this is by far the biggest and most reliable model I can run on this hardware for heavy agentic work and long context window. And surprizingly it is quite fast.

I don't know if @lukealonso output can help you determine what needs to be done for 2.1

lukealonso

Dec 28, 2025

@ktsaou I'll work on this today, hopefully it won't be too difficult.

xhybr1d

Dec 28, 2025

Aweomse @lukealonso ,also trying to work on it but getting limitated by my hardware I have to pass some weights to disk :(

lukealonso

Dec 29, 2025

Here it is: https://huggingface.co/lukealonso/MiniMax-M2.1-NVFP4/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment