The Stillhouse

The best open models you can run yourself.

A curated directory of top open-source and distilled models, built to answer one question: what’s the best model I can run on my hardware for my task? Filter by the RAM you have and the job you need done.

Sizes are approximate Q4_K_M GGUF footprints — budget KV-cache on top. Curated list, hand-reviewed; trending strip auto-refreshed from Hugging Face.

My hardware
My task

31 models

Qwen3-Embedding-0.6B

0.6B·dense
Apache-2.0

Best small multilingual embedding model for RAG — tiny, fast, with an official GGUF.

~0.6GB · Q4_K_M32k ctx
🤗 model

Qwen3-4B

4B·dense·hybrid-reason
Apache-2.0

The best small brain that fits in 4GB — toggles a thinking mode on for hard problems, off for fast triage.

~2.5GB · Q4_K_M32k ctx
🤗 modelGGUF

Qwen3-8B

8.2B·dense·hybrid-reason
Apache-2.0

The best balanced default at this size — strong tool-use and a toggleable thinking mode, all under Apache-2.0.

~5GB · Q4_K_M32k ctx
🤗 modelGGUF

Qwen3-30B-A3B-2507

30B·MoE · 3B active·hybrid-reason
Apache-2.0

Reasons like a 30B, decodes like a 3B — the best local agent brain. Fits 16GB at Q3, 24GB comfortably at Q4.

~18.6GB · Q4_K_M256k ctx
🤗 modelGGUF

Qwen2.5-Coder-32B

32B·dense
Apache-2.0

The mature dense code workhorse for a single 24GB GPU — fill-in-the-middle, long context, rock-solid.

~19.9GB · Q4_K_M128k ctx
🤗 modelGGUF

R1-Distill-Qwen-32B

32B·dense·reasoning
MIT
distilled from DeepSeek-R1

The top open dense reasoner you can run on one 24GB card — the model that beat o1-mini on several benchmarks.

~20GB · Q4_K_M128k ctx
🤗 modelGGUF

Llama 3.3 70B

70B·dense
Llama

The stable general-purpose 70B — runs on 2×24GB or a 48GB Mac, with broad ecosystem support.

~42.5GB · Q4_K_M128k ctx
🤗 modelGGUF

gpt-oss-120b

117B·MoE · 5.1B active·reasoning
Apache-2.0

o4-mini-class reasoning you can self-host — Apache-2.0, MoE-fast, ~63GB at every quant.

~63GB · Q4_K_M128k ctx
🤗 modelGGUF

Nomic Embed v1.5

0.1B·dense
Apache-2.0

Tiny, fast English embeddings with 8k context — the lightweight default for local RAG.

~0.3GB · Q4_K_M8k ctx
🤗 model

BGE-M3

0.6B·dense
MIT

Hybrid dense+sparse+ColBERT retrieval in one model, great on long documents. MIT-licensed.

~1.2GB · Q4_K_M8k ctx
🤗 model

moondream2

~2B·dense
Apache-2.0

A tiny, fast VLM for edge devices — image Q&A and captioning in under 2GB.

~1.8GB · Q4_K_M2k ctx
🤗 model

Llama 3.2 3B

3B·dense
Llama

Tight, fast, and maximally compatible — a dependable default for classification and lightweight chat.

~2GB · Q4_K_M128k ctx
🤗 modelGGUF

SmolLM3-3B

3B·dense·hybrid-reason
Apache-2.0

The strongest fully-open 3B — open data, open recipe, dual think/no-think. A distillation-community favorite.

~2GB · Q4_K_M128k ctx
🤗 model

Gemma 3 4B

4B·dense
Gemma

Vision-capable, 128k context, 140+ languages — a versatile small multimodal chat/summarizer.

~2.5GB · Q4_K_M128k ctx
🤗 modelGGUF

Phi-4-mini

3.8B·dense
MIT

Microsoft's synthetic-data thesis in miniature — punches well above its size on math and structured tasks.

~2.5GB · Q4_K_M128k ctx

R1-Distill-Qwen-7B

7B·dense·reasoning
MIT
distilled from DeepSeek-R1

A math-leaning reasoner — R1 distilled onto a Qwen math base. Strong AIME-style performance for 7B.

~4.7GB · Q4_K_M128k ctx
🤗 modelGGUF

R1-Distill-Llama-8B

8B·dense·reasoning
MIT
distilled from DeepSeek-R1

Chain-of-thought reasoning in an 8B — DeepSeek-R1's traces distilled into a Llama base. The viral 2025 recipe.

~4.9GB · Q4_K_M128k ctx
🤗 modelGGUF

Llama 3.1 8B

8B·dense
Llama

The most battle-tested 8B — maximum ecosystem and tooling compatibility when you want a known-good base.

~4.9GB · Q4_K_M128k ctx
🤗 modelGGUF

Qwen2.5-VL-7B

7B·dense
Apache-2.0

The go-to small vision-language model — OCR, documents, video, and GUI grounding, Apache-licensed.

~6GB · Q4_K_M128k ctx
🤗 model

Gemma 3 12B

12B·dense
Gemma

Multimodal, 128k context, 140+ languages — and itself trained with knowledge distillation.

~7.3GB · Q4_K_M128k ctx
🤗 modelGGUF

Qwen3-14B

14B·dense·hybrid-reason
Apache-2.0

A fully-on-GPU dense option with big KV-cache headroom — the safe 16GB workhorse.

~9GB · Q4_K_M32k ctx
🤗 modelGGUF

Phi-4-reasoning-plus

14B·dense·reasoning
MIT
distilled from o3-mini (traces)

A 14B reasoner that rivals much larger distills on AIME — CoT supervised fine-tuning plus RL.

~9GB · Q4_K_M32k ctx

R1-Distill-Qwen-14B

14B·dense·reasoning
MIT
distilled from DeepSeek-R1

The mid-size sweet spot for distilled reasoning — most of R1-32B's ability in a 16GB-friendly model.

~9GB · Q4_K_M128k ctx
🤗 modelGGUF

gpt-oss-20b

21B·MoE · 3.6B active·reasoning
Apache-2.0

OpenAI's open MoE — o-mini-class reasoning, Apache-2.0, and a fixed ~12GB footprint at every quant.

~12.1GB · Q4_K_M128k ctx
🤗 modelGGUF

Mistral Small 3.2 24B

24B·dense
Apache-2.0

The best large *Apache* chat model — multimodal, strong tool-use, comfortable on a 24GB card.

~14GB · Q4_K_M128k ctx

Gemma 3 27B

27B·dense
Gemma

Gemini-1.5-Pro-class on benchmarks, fully local — ships official distilled QAT int4 checkpoints.

~16.5GB · Q4_K_M128k ctx
🤗 modelGGUF

Qwen3-Coder-30B-A3B

30B·MoE · 3B active
Apache-2.0

The best local agentic coder for 16–24GB — native tool-call format for Cline/Qwen Code, MoE-fast.

~18.6GB · Q4_K_M256k ctx

Qwen3-32B

32B·dense·hybrid-reason
Apache-2.0

The strongest dense Qwen3 generalist — a do-everything model for a 24GB card.

~19.8GB · Q4_K_M32k ctx
🤗 modelGGUF

QwQ-32B

32B·dense·reasoning
Apache-2.0

Qwen's dedicated reasoning model with the cleanest license in its class — Apache-2.0 throughout.

~19.8GB · Q4_K_M128k ctx
🤗 model

R1-Distill-Llama-70B

70B·dense·reasoning
MIT
distilled from DeepSeek-R1

Heavy local chain-of-thought — R1's reasoning distilled into a 70B Llama for serious work.

~42.5GB · Q4_K_M128k ctx
🤗 modelGGUF

Qwen3-235B-A22B-2507

235B·MoE · 22B active·hybrid-reason
Apache-2.0

A frontier open MoE — runs at low quant on 128GB+ unified memory or a multi-GPU rig.

~142GB · Q4_K_M256k ctx
Distillation lineage

Who learned from whom.

The thing that makes these models small is that they were taught by something larger. Here’s the teacher→student map behind the directory.

DeepSeek-R1teacher
💧 R1-Distill-Llama-8B💧 R1-Distill-Qwen-7B💧 R1-Distill-Qwen-14B💧 R1-Distill-Qwen-32B💧 R1-Distill-Llama-70B
o3-mini (traces)teacher
💧 Phi-4-reasoning-plus
Claude Opusteacher
💧 Jackrong's “Qwopus” series (Opus reasoning → Qwen)
Llama-3.1-405B + Qwen-72Bteacher
💧 Arcee SuperNova (cross-architecture logit distill)
The distillers

People worth following.

The quantizers, fine-tuners, and distillers turning frontier models into things you can actually run.