Engineering for the Agentic Web: When 50% of Traffic Is Robots

Introduction

Customer web traffic has changed: a large and growing share is unknown browsers and AI agents acting on behalf of users. The era of optimising solely for traditional search engine crawlers and Core Web Vitals is shifting; the new challenge is feeding focused, low-noise context to autonomous agents and LLMs. Pages need to work for both humans and the agents reading on their behalf — and the engineering patterns are still being written.

Why this matters

Agents make purchase, support, and research decisions; if your site is illegible, you lose the transaction.
Token-budgeted agents reward concise, structured pages.
Bot traffic is a real cost driver and a real abuse vector.
New standards (llms.txt, AGENTS.md) are emerging; early adoption pays.

Core concepts

llms.txt

A markdown file at the root of your site that gives LLMs a curated, low-noise map: the canonical content, links to clean Markdown versions, and what to ignore.

Machine-readable summaries

Each page exposes a structured summary (JSON-LD, Open Graph, plus an LLM-targeted Markdown variant) so agents don't have to scrape JS-rendered DOMs.

Agent-friendly markup

Semantic HTML, stable IDs, no critical content behind JS, accessible patterns (which also help agents). What's good for accessibility is largely good for agents.

Bot economics

Agents hammer endpoints. Caching, conditional requests (ETag/304), and per-agent rate limits keep your origin costs sane.

Practical patterns

Markdown twin pages

For every HTML page, expose /page.md with a clean Markdown rendering. Agents prefer it; you get one page, two consumers.

llms.txt at the root

Curate a top-level index; link to your most important pages and Markdown twins. Update when content changes.

Identity headers and abuse policies

Encourage agents to send identifying user-agents; rate-limit by identity; offer sane access tiers.

Server-rendered critical content

Agents that don't execute JS should still be able to read your headlines, prices, and key actions.

Pitfalls to avoid

JS-only sites that are invisible to non-rendering agents.
No rate limits — bots will find you and your bill will explode.
Anti-bot defences that block legitimate AI clients indiscriminately.
Failing to test how the top three agents (ChatGPT, Claude, Perplexity) actually see your pages.