Introduction
The smartest model doesn't always win. In agentic coding loops, a model that's 10x faster but only marginally competent can often fail its way to success before a frontier model finishes reasoning. The maths are counter-intuitive: if each attempt improves a solution by 20%, dozens of iterations per minute compound into faster convergence than slow, high-quality reasoning. Loop velocity is becoming the new performance frontier.
Why this matters
- Iteration speed is now a first-class engineering metric.
- Diffusion-based language models (e.g. Mercury 2) generate in parallel rather than serially, removing a serial bottleneck.
- Total time to a working solution = (iterations ร time per iteration) โ speeding up either factor compounds.
- Slow models burn user attention and willingness to wait; fast wrong-then-right often beats slow right.
Core concepts
Convergence dynamics
If each attempt improves output by p%, expected attempts to converge โ log(target/start)/log(1+p). Faster iteration trims the per-attempt cost; if p stays positive, fast wins.
Autoregressive vs. diffusion
Autoregressive models generate one token at a time. Diffusion language models generate the whole sequence in parallel and refine โ fundamentally faster for many workloads.
When fast loses
When per-attempt p is near zero or negative โ i.e. the model can't actually make progress โ speed is irrelevant. You need a baseline of competence.
Loop design matters
Fast iteration only helps if you can verify success cheaply. Compile/test/lint/eval gates are what convert speed into convergence.
Practical patterns
Cheap verifier in the loop
A fast deterministic check (compile, lint, schema) runs after every attempt; skip semantic eval until the cheap check passes.
Parallel candidates
Spawn N candidate solutions simultaneously; pick the first that passes verification. Diffusion models do this naturally.
Adaptive model tier
Start with the fastest model; escalate to a smarter one only after K failed attempts.
Latency budgets
Per-task wall-clock budget. If the slow model can't meet it, the fast model owns the task.
Pitfalls to avoid
- Optimising tokens-per-second without measuring tasks-per-minute.
- Looping with no verifier โ you just produce wrong answers faster.
- Ignoring the long tail of tasks where a smart model is the only one that ever succeeds.
- Cost runaway: many cheap-fast attempts can add up to more than one expensive-smart attempt.
Key takeaways
- 1Speed compounds; quality plateaus. Both matter, but speed is under-invested.
- 2Diffusion models are a real architectural shift, not just a benchmark.
- 3Build cheap verifiers; speed without verification is just noise.
- 4Always escalate when fast iteration plateaus.
Go deeper ยท external resources
Curated reading list to take you from primer to practitioner. All links are external and free to read.