Token Town: Why Compute Strategy Is Product Strategy

Introduction

Token pricing is a noisy headline. What actually matters in production is what you pay per task once outputs get longer, retries creep in, and "good enough" models get deprecated. Treating compute as a strategic asset — rather than a commodity invoice — is what separates AI products that compound from AI products that get squeezed every time a frontier lab updates its pricing page.

Why this matters

Per-million-token quotes hide the real cost driver: tokens-per-task, which scales with reasoning depth, retry rate, and tool-call fan-out.
Frontier model deprecation is now a quarterly event. If your product is wired to one provider, every deprecation becomes a migration.
The flywheel for AI products is built on product value that is hard to commoditise — UX, data, distribution — not on a model you don't own.
Multi-provider isn't a procurement strategy; it's an architectural one. The cost of being able to swap models has to be paid up front in your abstraction layer.

Core concepts

Cost per task vs. cost per token

A task is the unit your customer pays for: "summarise this doc," "draft this PR review," "answer this support ticket." Until you can attribute spend to a task, you can't optimise. Build cost telemetry that aggregates input + output + tool-call + retry tokens against a single task ID.

Genuine multi-provider, not LCD multi-provider

A "lowest-common-denominator" abstraction reduces every provider to text-in/text-out and throws away tool-use, structured outputs, caching, and reasoning controls. Genuine multi-provider means an abstraction that lets each provider's best features shine, with capability flags and graceful fallback paths.

The hard-to-commoditise moat

Models commoditise; product surfaces don't. Notion, Linear, Cursor — the durable AI products invest in workflow, latency, evals, and integrations. The model is plumbing.

Designing for swap

Treat the model as a versioned dependency. Pin versions, run shadow evals, keep prompt templates portable, and route at request time using a routing layer rather than at deployment time.

Practical patterns

Task-cost dashboards

Tag every model call with a task_id and surface p50/p95 cost per task class. Alert when a class drifts > 30% week-on-week.

Provider capability matrix

Maintain a living matrix: provider × (tool use, JSON mode, vision, caching, reasoning, latency, $/M in, $/M out). Re-test monthly.

Shadow evaluation

Mirror 1–5% of production traffic to a candidate model and grade outputs offline before any traffic switch.

Prompt portability tests

Run the same prompt across 3+ providers in CI; flag prompts that only work on one model as technical debt.

Pitfalls to avoid

Treating every workload as if it needed the most capable model.
Letting prompt templates absorb provider-specific quirks until you can't move.
Optimising token cost without measuring quality drift — a 30% cheaper model that fails 5% more often is usually a loss.
Ignoring caching: prompt caching can be the single biggest lever on cost-per-task.

Key takeaways

1Measure cost per task, not per token.
2Treat multi-provider as an architectural decision, made once and amortised.
3Invest the saved compute in product surface — that's where the moat is.
4Run shadow evals continuously so you can swap models without holding your breath.