
Distilling intelligence
into models you can hold.
A knowledgebase and research blog for turning vast, frontier models into small, fast, open ones that run on your own hardware — the craft of teaching a smaller mind to think like a larger one, charted as the field comes of age.
teacher → reasoning trace → student → quantize → run it local
What is model distillation?
A large teacher model knows far more than its size lets most people use. Knowledge distillation transfers that understanding into a smaller student — not by copying weights, but by learning from the teacher’s soft predictions, its reasoning traces, and the synthetic data it generates.
The result is a model a fraction of the size that keeps much of the capability — small enough to run on a laptop, a phone, or a single GPU in your closet. Distillation is how frontier intelligence becomes something you own.
Intelligence that fits where you need it.
Faster
A distilled student answers in a fraction of the time and cost of its teacher — real-time on modest hardware.
Smaller
From hundreds of billions of parameters to a handful — small enough for a laptop, edge device, or phone.
Yours
Run it offline, on-prem, private. No tokens metered, no data leaving the building, no rate limits.
Specialized
Distill only the capability you need. A focused student can rival a giant on its narrow domain.
Learn the craft, end to end.
A structured path from first principles to the techniques at the edge of the research.
What is model distillation?
A plain-language primer on knowledge distillation — how a small student model learns to think like a giant teacher, and why it's the key to running AI on your own hardware.
How distillation works: the three kinds of knowledge
Response, feature, and relation-based distillation — plus self, online, and offline variants. The conceptual map of how knowledge actually moves from teacher to student.
The distiller's toolkit
The frameworks people actually use to distill models in 2026 — from Hugging Face TRL and Arcee DistillKit to synthetic-data pipelines and managed cloud services.
Reasoning distillation: teaching small models to think
How chain-of-thought traces turned distillation from a compression trick into a way to transfer reasoning itself — the DeepSeek-R1 recipe and why it changed the field.
Notes from the still.
Welcome to The AI Distillery
Why we're building a knowledgebase at the frontier of model distillation — and what it means to make frontier intelligence small enough to own.
2 min read →How 800,000 traces gave small models o1-class reasoning
The DeepSeek-R1 distillation recipe was almost embarrassingly simple — and it rewrote what we thought small open models could do. A look at why it worked.
3 min read →Is distilling from GPT-4 legal? The OpenAI–DeepSeek question
The contractual, not copyright, fight over model distillation — what the terms of service actually say, what's alleged, and why the law here is genuinely untested.
3 min read →Today a knowledgebase. Tomorrow, the place you distill your own.
Distillation is in its infancy. We’re documenting the craft as it’s invented — and building toward open tooling and a home for the models the community distills. Come grow with us.