Fail Fast, Fix Faster: Why Speed Beats Smarts in Agent Loops

Introduction

The smartest model doesn't always win. In agentic coding loops, a model that's 10x faster but only marginally competent can often fail its way to success before a frontier model finishes reasoning. The maths are counter-intuitive: if each attempt improves a solution by 20%, dozens of iterations per minute compound into faster convergence than slow, high-quality reasoning. Loop velocity is becoming the new performance frontier.

Why this matters

Iteration speed is now a first-class engineering metric.
Diffusion-based language models (e.g. Mercury 2) generate in parallel rather than serially, removing a serial bottleneck.
Total time to a working solution = (iterations × time per iteration) — speeding up either factor compounds.
Slow models burn user attention and willingness to wait; fast wrong-then-right often beats slow right.

Core concepts

Convergence dynamics

If each attempt improves output by p%, expected attempts to converge ≈ log(target/start)/log(1+p). Faster iteration trims the per-attempt cost; if p stays positive, fast wins.

Autoregressive vs. diffusion

Autoregressive models generate one token at a time. Diffusion language models generate the whole sequence in parallel and refine — fundamentally faster for many workloads.

When fast loses

When per-attempt p is near zero or negative — i.e. the model can't actually make progress — speed is irrelevant. You need a baseline of competence.

Loop design matters

Fast iteration only helps if you can verify success cheaply. Compile/test/lint/eval gates are what convert speed into convergence.

Practical patterns

Cheap verifier in the loop

A fast deterministic check (compile, lint, schema) runs after every attempt; skip semantic eval until the cheap check passes.

Parallel candidates

Spawn N candidate solutions simultaneously; pick the first that passes verification. Diffusion models do this naturally.

Adaptive model tier

Start with the fastest model; escalate to a smarter one only after K failed attempts.

Latency budgets

Per-task wall-clock budget. If the slow model can't meet it, the fast model owns the task.

Pitfalls to avoid

Optimising tokens-per-second without measuring tasks-per-minute.
Looping with no verifier — you just produce wrong answers faster.
Ignoring the long tail of tasks where a smart model is the only one that ever succeeds.
Cost runaway: many cheap-fast attempts can add up to more than one expensive-smart attempt.