AI Engineer Melbourne
Knowledge Base
Software EngineeringIntermediate 10 min

Fail Fast, Fix Faster: Why Speed Beats Smarts in Agent Loops

A 10x faster, marginally competent model can iterate to success before a frontier model finishes thinking.

Introduction

The smartest model doesn't always win. In agentic coding loops, a model that's 10x faster but only marginally competent can often fail its way to success before a frontier model finishes reasoning. The maths are counter-intuitive: if each attempt improves a solution by 20%, dozens of iterations per minute compound into faster convergence than slow, high-quality reasoning. Loop velocity is becoming the new performance frontier.

Why this matters

  • Iteration speed is now a first-class engineering metric.
  • Diffusion-based language models (e.g. Mercury 2) generate in parallel rather than serially, removing a serial bottleneck.
  • Total time to a working solution = (iterations ร— time per iteration) โ€” speeding up either factor compounds.
  • Slow models burn user attention and willingness to wait; fast wrong-then-right often beats slow right.

Core concepts

1

Convergence dynamics

If each attempt improves output by p%, expected attempts to converge โ‰ˆ log(target/start)/log(1+p). Faster iteration trims the per-attempt cost; if p stays positive, fast wins.

2

Autoregressive vs. diffusion

Autoregressive models generate one token at a time. Diffusion language models generate the whole sequence in parallel and refine โ€” fundamentally faster for many workloads.

3

When fast loses

When per-attempt p is near zero or negative โ€” i.e. the model can't actually make progress โ€” speed is irrelevant. You need a baseline of competence.

4

Loop design matters

Fast iteration only helps if you can verify success cheaply. Compile/test/lint/eval gates are what convert speed into convergence.

Practical patterns

Cheap verifier in the loop

A fast deterministic check (compile, lint, schema) runs after every attempt; skip semantic eval until the cheap check passes.

Parallel candidates

Spawn N candidate solutions simultaneously; pick the first that passes verification. Diffusion models do this naturally.

Adaptive model tier

Start with the fastest model; escalate to a smarter one only after K failed attempts.

Latency budgets

Per-task wall-clock budget. If the slow model can't meet it, the fast model owns the task.

Pitfalls to avoid

  • Optimising tokens-per-second without measuring tasks-per-minute.
  • Looping with no verifier โ€” you just produce wrong answers faster.
  • Ignoring the long tail of tasks where a smart model is the only one that ever succeeds.
  • Cost runaway: many cheap-fast attempts can add up to more than one expensive-smart attempt.

Key takeaways

  1. 1Speed compounds; quality plateaus. Both matter, but speed is under-invested.
  2. 2Diffusion models are a real architectural shift, not just a benchmark.
  3. 3Build cheap verifiers; speed without verification is just noise.
  4. 4Always escalate when fast iteration plateaus.

Go deeper ยท external resources

Curated reading list to take you from primer to practitioner. All links are external and free to read.

More from Software Engineering