Technical
Model Selection Strategy: Picking the Right Brain for the Job
Six months ago I used one model for everything. Now I route between three based on the task. That routing is the single biggest quality and cost improvement I have made to my AI stack all year. Not fancier prompts. Not better RAG. Just picking the right brain for the job.
The Three Tiers
My routing comes down to three tiers:
- Small fast models (Haiku-class): pattern matching, classification, short summaries, lookups
- Mid-tier models (Sonnet-class): most implementation work, refactors, typical code edits
- Flagship models (Opus-class): architecture, complex debugging, cross-file reasoning, security review
The cost difference between tiers is roughly 10x step by step. A job that costs $0.10 on a flagship model might cost $0.01 on mid-tier and $0.001 on small. For any task that does not need the biggest brain, using it is throwing money at a problem that does not exist.
A Routing Heuristic
Here is the rule I follow when deciding:
- Does this task fit in one file and have an obvious pattern? Use small.
- Does this task need context from 3-10 files and produce code? Use mid-tier.
- Does this task require reasoning about system design, security, or correctness? Use flagship.
def pick_model(task_type: str) -> str:
if task_type in {'classify', 'extract', 'summarize'}:
return 'haiku'
if task_type in {'implement', 'refactor', 'test'}:
return 'sonnet'
return 'opus' # design, security, cross-cuttingThe Quality Effect
Using a flagship model for architecture is not just expensive when used on small tasks. It can also be worse. Flagship models love to over-engineer. They suggest patterns that are correct but unnecessary. A mid-tier model applying a simpler, known-good pattern often ships cleaner code for the everyday work.
Reserve the big brain for the moments that actually need one. Use the small brain when a small one will do. That is how you keep your AI bill sane and your code quality high at the same time.
The Anthropic models overview lists each model's capabilities.
RELATED READING
The Consulting Shift I Am Making In Year Two
After a year of writing and building, my consulting practice is changing shape. Shorter engagements. Sharper outcomes.
ReadThe Frontend Shift: Shipping Less JavaScript In Year Two
A year ago I reached for Next.js for everything. This year I often reach for nothing.
ReadThe Serverless Lesson I Would Write On A Sticky Note
After a year of shipping serverless projects, one rule explains most of the wins and all of the losses.
Read