hy-mt2-1.8b-4bit-mlx

Quantized version of tencent/Hy-MT2-1.8B for Apple Silicon using MLX.

Hy-MT2-1.8B is Tencent's multilingual translation model covering 40+ languages.

Quantization: Affine integer quantization
Precision: 4-bit (~4.5 bits/weight avg)
Group size: 64
Disk size: 970 MB
Quantized by: sahilchachra

About this variant

Standard affine (integer) quantization at 4-bit with group size 64. Largest compression ratio — recommended when memory is tight or you want the fastest decode throughput.

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Prefill (tok/s)	1486.53	1269.81
Decode (tok/s)	220.63	77.12
Peak memory (GB)	1.28	3.72
Disk size (MB)	970	3897

Translation quality (FLORES-200 devtest)

Reported as chrF++ (higher is better). Sample-size noted per pair.

Direction	This model	FP16 baseline	n
eng_Latn→fra_Latn	65.07	63.81	20
eng_Latn→deu_Latn	58.02	57.66	20
eng_Latn→zho_Hans	27.74	29.09	20
eng_Latn→jpn_Jpan	31.9	34.19	20
eng_Latn→spa_Latn	56.19	56.5	20
fra_Latn→eng_Latn	65.1	64.58	20
zho_Hans→eng_Latn	55.34	55.17	20
jpn_Jpan→eng_Latn	54.3	55.29	20

Avg chrF++: 56.9 vs FP16 56.95
Avg BLEU: 30.98 vs FP16 30.71

Context scaling (decode tok/s)

Context length	Decode tok/s
~128 tokens	97163.0
~256 tokens	214.0
~512 tokens	213.7
~1024 tokens	119402.9

Usage

Install

pip install mlx-lm

Translate

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/hy-mt2-1.8b-4bit-mlx")

prompt = (
    "Translate the following text from English to French.\n"
    "English: The early bird catches the worm.\n"
    "French:"
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))

Stream

from mlx_lm import load, stream_generate

model, tokenizer = load("sahilchachra/hy-mt2-1.8b-4bit-mlx")
for chunk in stream_generate(model, tokenizer, prompt="Translate \"Hello world\" to Japanese:", max_tokens=64):
    print(chunk.text, end="", flush=True)

All variants in this collection

Model	Method
sahilchachra/hy-mt2-1.8b-4bit-mlx	Affine int4 (group 64) ← this model
sahilchachra/hy-mt2-1.8b-8bit-mlx	Affine int8 (group 64)
sahilchachra/hy-mt2-1.8b-mxfp4-mlx	Block float MX FP4
sahilchachra/hy-mt2-1.8b-mxfp8-mlx	Block float MX FP8

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M5 Pro, 24 GB unified memory
FLORES-200 sample sizes are small — treat chrF/BLEU figures as indicative, not definitive
License: see tencent/Hy-MT2-1.8B for the original model's license terms

Original model

See tencent/Hy-MT2-1.8B for full model details, supported languages, and intended use.

Downloads last month: 48

Safetensors

Model size

0.3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for sahilchachra/hy-mt2-1.8b-4bit-mlx

Base model

tencent/Hy-MT2-1.8B

Quantized

(19)

this model