John Leimgruber III's picture

John Leimgruber III PRO

ubergarm

·

https://blog.aifoundry.org/p/adventures-in-model-quantization

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

new activity 1 day ago

unsloth/Qwen3.5-35B-A3B-Experiments-GGUF:Missing KLD logs for AesSedai quants?

new activity 1 day ago

noctrex/Qwen3.5-35B-A3B-MXFP4_MOE-GGUF:It's really good.

new activity 1 day ago

ubergarm/Qwen3.5-27B-GGUF:Insight into the "weird" data.

View all activity

Organizations

upvoted an article 9 days ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

9 days ago

•

469

upvoted a collection 3 months ago

Magic Quant

Hybrid GGUF quants created via an evolutionary quant algorithm. Want the best TPS? Lowest precision loss? Smallest file size? Welcome to MagicQuant! • 8 items • Updated Dec 16, 2025 • 27

upvoted a collection 4 months ago

Draft Models

Tiny "draft" models for speculative decoding. • 36 items • Updated Oct 29, 2025 • 6

upvoted 2 collections 9 months ago

YAQA

YAQA hessians (Sketch B) and models with the QTIP quantizer. See https://github.com/Cornell-RelaxML/yaqa/tree/main for more details. • 9 items • Updated Jun 6, 2025 • 3

EXL3 models

46 items • Updated Jan 10 • 39

upvoted 2 collections 10 months ago

Qwen3

84 items • Updated Dec 31, 2025 • 1.69k

SkyReels-V2

Infinite-length Film Generative Model • 17 items • Updated Jun 14, 2025 • 74

upvoted 2 collections 11 months ago

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10, 2025 • 217

GLM-4-0414

GLM-4-0414 series model • 8 items • Updated Jun 30, 2025 • 134

upvoted 2 articles 11 months ago

Article

Introduction to ggml

+1

Aug 13, 2024

•

270

Article

Comparing sub 50GB Llama 4 Scout quants (KLD/Top P)

Apr 9, 2025

•

45

upvoted a collection 12 months ago

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17, 2024 • 76

upvoted 2 articles about 1 year ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

Jan 28, 2025

•

888

Article

The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...

Jan 20, 2025

•

76

upvoted 2 collections about 1 year ago

Qwen2.5-1M

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Dec 31, 2025 • 126

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Dec 31, 2025 • 557

upvoted 3 collections over 1 year ago

Llama 3.2 3B & 1B GGUF Quants

Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. • 4 items • Updated Sep 26, 2024 • 46

Llama 3.1 GPTQ, AWQ, and BNB Quants

Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM 🤗 • 9 items • Updated Sep 26, 2024 • 57

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated Dec 31, 2025 • 227

upvoted a collection almost 2 years ago

abliterated-v3

Latest gen of the abliterated models I've produced • 17 items • Updated Jun 3, 2024 • 137