MIST-1-70B

MIST-1-70B is the mid-size model in the MIST model family by olaverse. Built by blending 4 of the best Llama 3.1 70B models using DARE+TIES. structured, detailed, production ready

MIST Model Family

Model Params Speed Status
MIST-1-8B 8B ~63 tok/s โœ… Available
MIST-1-70B 70B ~23 tok/s โœ… Available
MIST-1-140B 140B ~8 tok/s โœ… Available

Key Strengths

  • ๐Ÿง  Strong Reasoning โ€” DeepSeek R1 distillation at 70B scale
  • ๐Ÿค Highly Helpful โ€” built on Nemotron #1 on helpfulness benchmarks
  • ๐Ÿ’ป Coding โ€” clean documented production-ready code
  • ๐Ÿ“ Math โ€” step-by-step structured problem solving
  • ๐ŸŒ Multilingual โ€” supports 8+ languages
  • ๐Ÿ“š Long Context โ€” 128K token context window
  • ๐Ÿ”“ Unrestricted โ€” follows instructions without excessive refusals

Merge Method

MIST-1-70B uses DARE+TIES:

  • DARE prunes redundant weights from each model
  • TIES resolves weight conflicts using sign consensus
  • Result: best capabilities of all 4 models combined

Benchmark Results

Task Speed Quality
Reasoning 10.5s โœ… Correct step-by-step
Coding 11.3s โœ… Clean with type hints
Math 11.3s โœ… Structured with verification
General 11.3s โœ… Accurate and detailed
Instruction 8.1s โœ… Precise and formatted

Average: 23 tok/s

How to Use

bfloat16 โ€” Full Precision (140GB VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "olaverse/MIST-1-70B",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("olaverse/MIST-1-70B")

messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

4-bit Quantized (40GB VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type='nf4'
)
model = AutoModelForCausalLM.from_pretrained(
    "olaverse/MIST-1-70B",
    quantization_config=quantization_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("olaverse/MIST-1-70B")

Hardware Requirements

Precision VRAM Size
bfloat16 140GB (1x H200 or 2x H100) 132GB
4-bit (NF4) 40GB (1x A100/H100) ~35GB

License

Llama 3.1 Community License

Downloads last month
82
Safetensors
Model size
71B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/MIST-1-70B-MLX