AI Engineer Melbourne

HomeAI EngineeringPrimer

AI EngineeringIntermediate 10 min

Matching Models to Tasks: Routing for Cost and Quality

Don't use Opus for a Bash script. Build a routing layer.

Introduction

Most developers pick their AI model the same way: use the biggest, smartest one available for everything. Bash script? Opus. Dockerfile? Whatever's top of the dropdown. Then they hit usage limits halfway through the day and lose the productivity gains they were chasing. A routing layer matches model to task — and the eval setup behind it tells you when "cheap and fast" beats "smart and slow."

Why this matters

Frontier models are 10–100x more expensive than mid-tier; for many tasks the quality delta is < 5%.
Latency matters: a model that's 2x slower silently halves your iteration speed.
Vendor diversification reduces blast radius when a provider has an outage.
Routing is also rate-limit insurance — fall back to a different provider when you're throttled.

Core concepts

1

Task taxonomy

Categorise the work: trivial transforms, structural code edits, multi-file refactors, deep reasoning, creative writing. Each tier has a target model class.

2

Routing signals

Static (tool name, file type, prompt length), runtime (current load, recent failure rate), and cost (current spend vs. budget). Combine to pick a route.

3

Quality vs. cost frontier

Plot model + task pairs on cost-vs-quality axes; pick the cheapest model that passes your quality bar for that task class.

4

Fallback chains

Primary route fails or times out → cheaper backup → eventually a static heuristic. Never let a single provider failure take down the feature.

Practical patterns

Decision-table router

A simple lookup: task type × prompt length → model. Boring, fast, easy to debug.

Classifier router

A small model classifies the request and picks the route. More flexible; adds latency.

Budget-aware routing

Routes degrade as you approach a budget cap; premium models reserved for premium tasks.

Provider hedging

For latency-critical paths, fire two providers in parallel and take the first; useful for the long tail.

Pitfalls to avoid

Routing without an eval suite — you don't know if cheaper is actually worse.
Static routes that never get re-tuned as model prices change.
No observability per route — you can't see where the savings are coming from.
Falling back so often that the fallback model is your real model.

Key takeaways

1Pick models per task, not per project.
2Build the eval suite that proves each route is worth it.
3Re-tune routes monthly; the price/quality landscape moves fast.
4Always have a fallback path.

Go deeper · external resources

Curated reading list to take you from primer to practitioner. All links are external and free to read.

Task-Based LLM Routing (Portkey)

LiteLLM — provider abstraction

LLM Router (Morph)

Optimizing LLM Costs with Intelligent Routing

More from AI Engineering

AI EngineeringAdvanced

Mesh LLMs: Building AI From Spare Compute

A future where compute, models, and agentic capability are commodities you can route around.

13 min readRead primer

AI EngineeringAdvanced

Beyond Forgetful Bots: Persistent, Proactive Agent Architectures

From reactive chatbots to enduring AI partners that stick around and act on their own.

12 min readRead primer

AI EngineeringAdvanced

Shipping Sandboxed Workers for AI Agents

Letting users extend agents with custom code without letting their code escape.

11 min readRead primer