Model Overview

Description:

MiniMax M2.7 is a large language model for complex software engineering, agentic tool use, and office productivity workflows. It is presented as a model deeply participating in its own evolution, with support for complex agent harnesses, dynamic tool search, Agent Teams, and high-fidelity coding and document-editing tasks.

This model is for research and development only.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax M2.7 Model Card

License/Terms of Use:

GOVERNING TERMS: Use of this model is governed by the NVIDIA Software and Model Evaluation license.

Deployment Geography:

Global

Use Case:

Use Case: Designed for advanced coding assistance, agentic workflows, long-horizon software engineering, live production troubleshooting, office document generation and editing, and other complex multi-step productivity tasks.

Examples

Coding assistants and software engineering copilots
Agent harnesses with complex skill libraries and multi-tool search
Bug localization and production troubleshooting
Office document generation and editing workflows
Research, analysis, and productivity automation

Release Date:

Huggingface 04/24/2026 via https://huggingface.co/nvidia/MiniMax-M2.7-NVFP4

Model Architecture:

Architecture Type: Transformer
Network Architecture: Sparse Mixture-of-Experts (MoE)
Total Parameters: 230B
Active Parameters: 10B
Layers: 62
Hidden Size: 3072
Experts: 256 local experts, with 8 experts activated per token

Input:

Input Types: Text
Input Formats: String
Input Parameters: One-Dimensional (1D)
Other Input Properties: Supports long system prompts.
Input Context Length (ISL): 204,800

Output:

Output Types: Text
Output Format: String
Output Parameters: One-Dimensional (1D)
Other Output Properties: None

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

SGLang
vLLM

Supported Hardware Microarchitecture Compatibility:

NVIDIA Blackwell

Preferred Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

This model v1 and is NVFP4 quantized with nvidia-modelopt v0.43.0

Training and Evaluation Datasets:

Calibration Dataset:

Link: cnn_dailymail, Nemotron-Post-Training-Dataset-v2
Data Collection Method by dataset: Automated.
Labeling Method by dataset: Automated.
Properties: The cnn_dailymail dataset contains English-language news articles and summaries. Nemotron-Post-Training-Dataset-v2 is a post-training dataset curated by NVIDIA containing multi-turn conversations across diverse topics.

Training Dataset:

Data Modality: Text
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties: Undisclosed

Evaluation Dataset:

Datasets: MMLU-Pro, LiveCodebench, IFEval, GPQA Diamond, SciCode, AIME 2025, IFBench, and AA-LCR
Data Collection Method by dataset: Hybrid, Automated, Human
Labeling Method by dataset: Hybrid, Automated, Human
Properties: We evaluated the model on text-based reasoning and coding benchmarks: MMLU Pro is a multi-task language understanding benchmark with challenging multiple-choice questions across diverse academic domains; LiveCodeBench V6 contains competitive programming problems; SciCode evaluates scientific coding capabilities; IFEval is a benchmark that tests whether language models can follow explicit, verifiable formatting and structural constraints layered on top of content generation prompts; GPQA Diamond contains 448 graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry; AIME 2025 contains problems from the American Invitational Mathematics Examination; IFBench is a benchmark for evaluating instruction-following capabilities across diverse and structured task constraints; AA-LCR (Artificial Analysis Long Context Reasoning) is a long-context benchmark of 100 questions over documents ranging from 10k to 100k tokens that requires multi-step reasoning and synthesis across dispersed sections rather than simple retrieval.

Inference:

Engine: vLLM

Test Hardware: B200

Post Training Quantization

This model was obtained by quantizing the weights and activations of MiniMax M2.7 to NVFP4 data type, ready for inference with SGLang. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 1.65x.

Usage

To serve this checkpoint with SGLang, you can start the docker lmsysorg/sglang:latest and run the sample command below:

python3 -m sglang.launch_server --model nvidia/MiniMax-M2.7-NVFP4 \
  --tensor-parallel-size 8 \
  --quantization modelopt_fp4 \
  --trust-remote-code \
  --reasoning-parser minimax-append-think \
  --tool-call-parser minimax-m2 \
  --moe-runner-backend flashinfer_cutlass \
  --attention-backend flashinfer

To serve this checkpoint with vLLM, you can launch the docker image vllm/vllm-openai:latest and run the sample command below:

vllm serve nvidia/MiniMax-M2.7-NVFP4 \
  --tensor-parallel-size 8 \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2_append_think \
  --enable-auto-tool-choice \
  --trust-remote-code

Evaluation

The accuracy benchmark results are presented in the table below:

Precision	IFEval	MMLU Pro	GPQA Diamond	LiveCodeBench	SciCode	AIME 2025	IFBench	AA-LCR
FP8 (baseline)	0.909	0.824	0.860	0.573	0.498	0.892	0.733	0.718
NVFP4	0.904	0.817	0.857	0.582	0.487	0.888	0.728	0.728

Baseline and evaluation settings are not fully disclosed on the referenced MiniMax M2.7 page.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month: 2,202

Safetensors

Model size

116B params

Tensor type

BF16

F32

F8_E4M3

Model tree for nvidia/MiniMax-M2.7-NVFP4

Base model

MiniMaxAI/MiniMax-M2.7

Quantized

(83)

this model