Introduction
Customer web traffic has changed: a large and growing share is unknown browsers and AI agents acting on behalf of users. The era of optimising solely for traditional search engine crawlers and Core Web Vitals is shifting; the new challenge is feeding focused, low-noise context to autonomous agents and LLMs. Pages need to work for both humans and the agents reading on their behalf — and the engineering patterns are still being written.
Why this matters
- Agents make purchase, support, and research decisions; if your site is illegible, you lose the transaction.
- Token-budgeted agents reward concise, structured pages.
- Bot traffic is a real cost driver and a real abuse vector.
- New standards (llms.txt, AGENTS.md) are emerging; early adoption pays.
Core concepts
llms.txt
A markdown file at the root of your site that gives LLMs a curated, low-noise map: the canonical content, links to clean Markdown versions, and what to ignore.
Machine-readable summaries
Each page exposes a structured summary (JSON-LD, Open Graph, plus an LLM-targeted Markdown variant) so agents don't have to scrape JS-rendered DOMs.
Agent-friendly markup
Semantic HTML, stable IDs, no critical content behind JS, accessible patterns (which also help agents). What's good for accessibility is largely good for agents.
Bot economics
Agents hammer endpoints. Caching, conditional requests (ETag/304), and per-agent rate limits keep your origin costs sane.
Practical patterns
Markdown twin pages
For every HTML page, expose /page.md with a clean Markdown rendering. Agents prefer it; you get one page, two consumers.
llms.txt at the root
Curate a top-level index; link to your most important pages and Markdown twins. Update when content changes.
Identity headers and abuse policies
Encourage agents to send identifying user-agents; rate-limit by identity; offer sane access tiers.
Server-rendered critical content
Agents that don't execute JS should still be able to read your headlines, prices, and key actions.
Pitfalls to avoid
- JS-only sites that are invisible to non-rendering agents.
- No rate limits — bots will find you and your bill will explode.
- Anti-bot defences that block legitimate AI clients indiscriminately.
- Failing to test how the top three agents (ChatGPT, Claude, Perplexity) actually see your pages.
Key takeaways
- 1Design for two consumers: humans and agents.
- 2Adopt llms.txt and Markdown twins; they're cheap and forward-compatible.
- 3Rate-limit and authenticate; the agentic web is also a DDoS surface.
- 4Test what the agent sees, not just what the user sees.
Go deeper · external resources
Curated reading list to take you from primer to practitioner. All links are external and free to read.