Model Overview
Description:
MiniMax M2.7 is a large language model for complex software engineering, agentic tool use, and office productivity workflows. It is presented as a model deeply participating in its own evolution, with support for complex agent harnesses, dynamic tool search, Agent Teams, and high-fidelity coding and document-editing tasks.
This model is for research and development only.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax M2.7 Model Card
License/Terms of Use:
GOVERNING TERMS: Use of this model is governed by the NVIDIA Software and Model Evaluation license.
ADDITIONAL INFORMATION: Non-Commercial MiniMax License. Copyright (c) 2026 MiniMax.
Deployment Geography:
Global
Use Case:
Use Case: Designed for advanced coding assistance, agentic workflows, long-horizon software engineering, live production troubleshooting, office document generation and editing, and other complex multi-step productivity tasks.
Examples
- Coding assistants and software engineering copilots
- Agent harnesses with complex skill libraries and multi-tool search
- Bug localization and production troubleshooting
- Office document generation and editing workflows
- Research, analysis, and productivity automation
Release Date:
Huggingface 04/24/2026 via https://huggingface.co/nvidia/MiniMax-M2.7-NVFP4
Model Architecture:
Architecture Type: Transformer
Network Architecture: Sparse Mixture-of-Experts (MoE)
Total Parameters: 230B
Active Parameters: 10B
Layers: 62
Hidden Size: 3072
Experts: 256 local experts, with 8 experts activated per token
Input:
Input Types: Text
Input Formats: String
Input Parameters: One-Dimensional (1D)
Other Input Properties: Supports long system prompts.
Input Context Length (ISL): 204,800
Output:
Output Types: Text
Output Format: String
Output Parameters: One-Dimensional (1D)
Other Output Properties: None
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- SGLang
- vLLM
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Blackwell
Preferred Operating System(s):
- Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Model Version(s):
This model v1 and is NVFP4 quantized with nvidia-modelopt v0.43.0
Training and Evaluation Datasets:
Calibration Dataset:
Link: cnn_dailymail, Nemotron-Post-Training-Dataset-v2
Data Collection Method by dataset: Automated.
Labeling Method by dataset: Automated.
Properties: The cnn_dailymail dataset contains English-language news articles and summaries. Nemotron-Post-Training-Dataset-v2 is a post-training dataset curated by NVIDIA containing multi-turn conversations across diverse topics.
Training Dataset:
Data Modality: Text
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties: Undisclosed
Evaluation Dataset:
Datasets: MMLU-Pro, LiveCodebench, IFEval, GPQA Diamond, SciCode, AIME 2025, IFBench, and AA-LCR
Data Collection Method by dataset: Hybrid, Automated, Human
Labeling Method by dataset: Hybrid, Automated, Human
Properties: We evaluated the model on text-based reasoning and coding benchmarks: MMLU Pro is a multi-task language understanding benchmark with challenging multiple-choice questions across diverse academic domains; LiveCodeBench V6 contains competitive programming problems; SciCode evaluates scientific coding capabilities; IFEval is a benchmark that tests whether language models can follow explicit, verifiable formatting and structural constraints layered on top of content generation prompts; GPQA Diamond contains 448 graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry; AIME 2025 contains problems from the American Invitational Mathematics Examination; IFBench is a benchmark for evaluating instruction-following capabilities across diverse and structured task constraints; AA-LCR (Artificial Analysis Long Context Reasoning) is a long-context benchmark of 100 questions over documents ranging from 10k to 100k tokens that requires multi-step reasoning and synthesis across dispersed sections rather than simple retrieval.
Inference:
Engine: vLLM
Test Hardware: B200
Post Training Quantization
This model was obtained by quantizing the weights and activations of MiniMax M2.7 to NVFP4 data type, ready for inference with SGLang. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 1.65x.
Usage
To serve this checkpoint with SGLang, you can start the docker lmsysorg/sglang:latest and run the sample command below:
python3 -m sglang.launch_server --model nvidia/MiniMax-M2.7-NVFP4 \
--tensor-parallel-size 8 \
--quantization modelopt_fp4 \
--trust-remote-code \
--reasoning-parser minimax-append-think \
--tool-call-parser minimax-m2 \
--moe-runner-backend flashinfer_cutlass \
--attention-backend flashinfer
To serve this checkpoint with vLLM, you can launch the docker image vllm/vllm-openai:latest and run the sample command below:
vllm serve nvidia/MiniMax-M2.7-NVFP4 \
--tensor-parallel-size 8 \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--enable-auto-tool-choice \
--trust-remote-code
Evaluation
The accuracy benchmark results are presented in the table below:
| Precision | IFEval | MMLU Pro | GPQA Diamond | LiveCodeBench | SciCode | AIME 2025 | IFBench | AA-LCR |
|---|---|---|---|---|---|---|---|---|
| FP8 (baseline) | 0.909 | 0.824 | 0.860 | 0.573 | 0.498 | 0.892 | 0.733 | 0.718 |
| NVFP4 | 0.904 | 0.817 | 0.857 | 0.582 | 0.487 | 0.888 | 0.728 | 0.728 |
Baseline and evaluation settings are not fully disclosed on the referenced MiniMax M2.7 page.
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
- Downloads last month
- 2,202
Model tree for nvidia/MiniMax-M2.7-NVFP4
Base model
MiniMaxAI/MiniMax-M2.7