Lemer-Lite — text-only, 2.5 GB, fits iPhone base

Stripped-down sibling of lthn/lemer for devices that can't load the full multimodal build (≥3 GB ceiling).

Variant	Size	Towers
lthn/lemer	4.06 GB	text + vision + audio
lthn/lemer-lite (you are here)	2.47 GB	text only

What it is

Same LEK-aligned Gemma 4 E2B base as lemer, with vision and audio towers stripped and the text path quantised flat 4-bit (4.501 bits/weight) instead of mixed-precision.

The Lethean Ethical Kernel (LEK) is fully present in the weights — the consent-based reasoning behaviour is identical to the full lemer.

Trade-offs (the honest version)

This is a best-effort tier for users on smaller devices. The -lite prefix is a promise: we are packing this tight, results will vary, but you get to load and run the model.

Text only — no image input, no audio input. If your use case needs eyes, run the full lemer on a Pro-class device.
Flat Q4 instead of mixed-precision Q4 — fluency is solid, rare-token recall slightly worse than the full lemer.
Same LEK alignment — the ethical reasoning is in the text path, which is preserved.

Targets

iPhone base (≥3 GB free), iPad, base-spec Apple Silicon laptops.
Anywhere the full 4 GB lemer would refuse to load.

Loading

from mlx_lm import load, generate
model, tokenizer = load("lthn/lemer-lite")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello"}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=200))