Instructions to use sahilchachra/hy-mt2-1.8b-4bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilchachra/hy-mt2-1.8b-4bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir hy-mt2-1.8b-4bit-mlx sahilchachra/hy-mt2-1.8b-4bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
hy-mt2-1.8b-4bit-mlx
Quantized version of tencent/Hy-MT2-1.8B for Apple Silicon using MLX.
Hy-MT2-1.8B is Tencent's multilingual translation model covering 40+ languages.
Quantization: Affine integer quantization
Precision: 4-bit (~4.5 bits/weight avg)
Group size: 64
Disk size: 970 MB
Quantized by: sahilchachra
About this variant
Standard affine (integer) quantization at 4-bit with group size 64. Largest compression ratio — recommended when memory is tight or you want the fastest decode throughput.
Benchmark results
Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.
Performance
| This model | FP16 baseline | |
|---|---|---|
| Prefill (tok/s) | 1486.53 | 1269.81 |
| Decode (tok/s) | 220.63 | 77.12 |
| Peak memory (GB) | 1.28 | 3.72 |
| Disk size (MB) | 970 | 3897 |
Translation quality (FLORES-200 devtest)
Reported as chrF++ (higher is better). Sample-size noted per pair.
| Direction | This model | FP16 baseline | n |
|---|---|---|---|
| eng_Latn→fra_Latn | 65.07 | 63.81 | 20 |
| eng_Latn→deu_Latn | 58.02 | 57.66 | 20 |
| eng_Latn→zho_Hans | 27.74 | 29.09 | 20 |
| eng_Latn→jpn_Jpan | 31.9 | 34.19 | 20 |
| eng_Latn→spa_Latn | 56.19 | 56.5 | 20 |
| fra_Latn→eng_Latn | 65.1 | 64.58 | 20 |
| zho_Hans→eng_Latn | 55.34 | 55.17 | 20 |
| jpn_Jpan→eng_Latn | 54.3 | 55.29 | 20 |
Avg chrF++: 56.9 vs FP16 56.95
Avg BLEU: 30.98 vs FP16 30.71
Context scaling (decode tok/s)
| Context length | Decode tok/s |
|---|---|
| ~128 tokens | 97163.0 |
| ~256 tokens | 214.0 |
| ~512 tokens | 213.7 |
| ~1024 tokens | 119402.9 |
Usage
Install
pip install mlx-lm
Translate
from mlx_lm import load, generate
model, tokenizer = load("sahilchachra/hy-mt2-1.8b-4bit-mlx")
prompt = (
"Translate the following text from English to French.\n"
"English: The early bird catches the worm.\n"
"French:"
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))
Stream
from mlx_lm import load, stream_generate
model, tokenizer = load("sahilchachra/hy-mt2-1.8b-4bit-mlx")
for chunk in stream_generate(model, tokenizer, prompt="Translate \"Hello world\" to Japanese:", max_tokens=64):
print(chunk.text, end="", flush=True)
All variants in this collection
| Model | Method |
|---|---|
| sahilchachra/hy-mt2-1.8b-4bit-mlx | Affine int4 (group 64) ← this model |
| sahilchachra/hy-mt2-1.8b-8bit-mlx | Affine int8 (group 64) |
| sahilchachra/hy-mt2-1.8b-mxfp4-mlx | Block float MX FP4 |
| sahilchachra/hy-mt2-1.8b-mxfp8-mlx | Block float MX FP8 |
Notes
- Requires Apple Silicon (M1 or later) with MLX
- Benchmarks run on Apple M5 Pro, 24 GB unified memory
- FLORES-200 sample sizes are small — treat chrF/BLEU figures as indicative, not definitive
- License: see tencent/Hy-MT2-1.8B for the original model's license terms
Original model
See tencent/Hy-MT2-1.8B for full model details, supported languages, and intended use.
- Downloads last month
- 48
4-bit
Model tree for sahilchachra/hy-mt2-1.8b-4bit-mlx
Base model
tencent/Hy-MT2-1.8B