A field guide to model distillation

Distilling intelligence
into models you can hold.

A knowledgebase and research blog for turning vast, frontier models into small, fast, open ones that run on your own hardware — the craft of teaching a smaller mind to think like a larger one, charted as the field comes of age.

teacher → reasoning trace → student → quantize → run it local

Start with the primer Read the blog

The idea

What is model distillation?

A large teacher model knows far more than its size lets most people use. Knowledge distillation transfers that understanding into a smaller student — not by copying weights, but by learning from the teacher’s soft predictions, its reasoning traces, and the synthetic data it generates.

The result is a model a fraction of the size that keeps much of the capability — small enough to run on a laptop, a phone, or a single GPU in your closet. Distillation is how frontier intelligence becomes something you own.

Read the full primer→

🧠

teacher~671B params

distill

soft labels · traces

💧

student~7B params

Why distill

Intelligence that fits where you need it.

◷

Faster

A distilled student answers in a fraction of the time and cost of its teacher — real-time on modest hardware.

⬡

Smaller

From hundreds of billions of parameters to a handful — small enough for a laptop, edge device, or phone.

⚇

Yours

Run it offline, on-prem, private. No tokens metered, no data leaving the building, no rate limits.

✦

Specialized

Distill only the capability you need. A focused student can rival a giant on its narrow domain.

The knowledge base

Learn the craft, end to end.

A structured path from first principles to the techniques at the edge of the research.

Browse all guides→

Primer· 4 min

What is model distillation?

A plain-language primer on knowledge distillation — how a small student model learns to think like a giant teacher, and why it's the key to running AI on your own hardware.

Foundations· 4 min

How distillation works: the three kinds of knowledge

Response, feature, and relation-based distillation — plus self, online, and offline variants. The conceptual map of how knowledge actually moves from teacher to student.

Practitioner· 3 min

The distiller's toolkit

The frameworks people actually use to distill models in 2026 — from Hugging Face TRL and Arcee DistillKit to synthetic-data pipelines and managed cloud services.

Frontier· 3 min

Reasoning distillation: teaching small models to think

How chain-of-thought traces turned distillation from a compression trick into a way to transfer reasoning itself — the DeepSeek-R1 recipe and why it changed the field.

From the blog

Notes from the still.

All posts→

June 18, 2026

Welcome to The AI Distillery

Why we're building a knowledgebase at the frontier of model distillation — and what it means to make frontier intelligence small enough to own.

2 min read →

June 16, 2026

How 800,000 traces gave small models o1-class reasoning

The DeepSeek-R1 distillation recipe was almost embarrassingly simple — and it rewrote what we thought small open models could do. A look at why it worked.

3 min read →

June 12, 2026

Is distilling from GPT-4 legal? The OpenAI–DeepSeek question

The contractual, not copyright, fight over model distillation — what the terms of service actually say, what's alleged, and why the law here is genuinely untested.

3 min read →

Where this is going

Today a knowledgebase. Tomorrow, the place you distill your own.

Distillation is in its infancy. We’re documenting the craft as it’s invented — and building toward open tooling and a home for the models the community distills. Come grow with us.

Read our mission Explore the knowledge base

Distilling intelligenceinto models you can hold.

What is model distillation?

Intelligence that fits where you need it.

Faster

Smaller

Yours

Specialized

Learn the craft, end to end.

What is model distillation?

How distillation works: the three kinds of knowledge

The distiller's toolkit

Reasoning distillation: teaching small models to think

Notes from the still.

Welcome to The AI Distillery

How 800,000 traces gave small models o1-class reasoning

Is distilling from GPT-4 legal? The OpenAI–DeepSeek question

Today a knowledgebase. Tomorrow, the place you distill your own.

Distilling intelligence
into models you can hold.