AI Engineer Melbourne
Knowledge Base
Keynote InsightsIntermediate 12 min

Token Town: Why Compute Strategy Is Product Strategy

In production, what matters is cost-per-task โ€” not the price-per-million-tokens headline.

Introduction

Token pricing is a noisy headline. What actually matters in production is what you pay per task once outputs get longer, retries creep in, and "good enough" models get deprecated. Treating compute as a strategic asset โ€” rather than a commodity invoice โ€” is what separates AI products that compound from AI products that get squeezed every time a frontier lab updates its pricing page.

Why this matters

  • Per-million-token quotes hide the real cost driver: tokens-per-task, which scales with reasoning depth, retry rate, and tool-call fan-out.
  • Frontier model deprecation is now a quarterly event. If your product is wired to one provider, every deprecation becomes a migration.
  • The flywheel for AI products is built on product value that is hard to commoditise โ€” UX, data, distribution โ€” not on a model you don't own.
  • Multi-provider isn't a procurement strategy; it's an architectural one. The cost of being able to swap models has to be paid up front in your abstraction layer.

Core concepts

1

Cost per task vs. cost per token

A task is the unit your customer pays for: "summarise this doc," "draft this PR review," "answer this support ticket." Until you can attribute spend to a task, you can't optimise. Build cost telemetry that aggregates input + output + tool-call + retry tokens against a single task ID.

2

Genuine multi-provider, not LCD multi-provider

A "lowest-common-denominator" abstraction reduces every provider to text-in/text-out and throws away tool-use, structured outputs, caching, and reasoning controls. Genuine multi-provider means an abstraction that lets each provider's best features shine, with capability flags and graceful fallback paths.

3

The hard-to-commoditise moat

Models commoditise; product surfaces don't. Notion, Linear, Cursor โ€” the durable AI products invest in workflow, latency, evals, and integrations. The model is plumbing.

4

Designing for swap

Treat the model as a versioned dependency. Pin versions, run shadow evals, keep prompt templates portable, and route at request time using a routing layer rather than at deployment time.

Practical patterns

Task-cost dashboards

Tag every model call with a task_id and surface p50/p95 cost per task class. Alert when a class drifts > 30% week-on-week.

Provider capability matrix

Maintain a living matrix: provider ร— (tool use, JSON mode, vision, caching, reasoning, latency, $/M in, $/M out). Re-test monthly.

Shadow evaluation

Mirror 1โ€“5% of production traffic to a candidate model and grade outputs offline before any traffic switch.

Prompt portability tests

Run the same prompt across 3+ providers in CI; flag prompts that only work on one model as technical debt.

Pitfalls to avoid

  • Treating every workload as if it needed the most capable model.
  • Letting prompt templates absorb provider-specific quirks until you can't move.
  • Optimising token cost without measuring quality drift โ€” a 30% cheaper model that fails 5% more often is usually a loss.
  • Ignoring caching: prompt caching can be the single biggest lever on cost-per-task.

Key takeaways

  1. 1Measure cost per task, not per token.
  2. 2Treat multi-provider as an architectural decision, made once and amortised.
  3. 3Invest the saved compute in product surface โ€” that's where the moat is.
  4. 4Run shadow evals continuously so you can swap models without holding your breath.

Go deeper ยท external resources

Curated reading list to take you from primer to practitioner. All links are external and free to read.

More from Keynote Insights