AI Engineer Melbourne
Knowledge Base
Software EngineeringIntermediate 11 min

Engineering for the Agentic Web: When 50% of Traffic Is Robots

Pages aren't just for humans anymore. Design for agents reading on their behalf.

Introduction

Customer web traffic has changed: a large and growing share is unknown browsers and AI agents acting on behalf of users. The era of optimising solely for traditional search engine crawlers and Core Web Vitals is shifting; the new challenge is feeding focused, low-noise context to autonomous agents and LLMs. Pages need to work for both humans and the agents reading on their behalf — and the engineering patterns are still being written.

Why this matters

  • Agents make purchase, support, and research decisions; if your site is illegible, you lose the transaction.
  • Token-budgeted agents reward concise, structured pages.
  • Bot traffic is a real cost driver and a real abuse vector.
  • New standards (llms.txt, AGENTS.md) are emerging; early adoption pays.

Core concepts

1

llms.txt

A markdown file at the root of your site that gives LLMs a curated, low-noise map: the canonical content, links to clean Markdown versions, and what to ignore.

2

Machine-readable summaries

Each page exposes a structured summary (JSON-LD, Open Graph, plus an LLM-targeted Markdown variant) so agents don't have to scrape JS-rendered DOMs.

3

Agent-friendly markup

Semantic HTML, stable IDs, no critical content behind JS, accessible patterns (which also help agents). What's good for accessibility is largely good for agents.

4

Bot economics

Agents hammer endpoints. Caching, conditional requests (ETag/304), and per-agent rate limits keep your origin costs sane.

Practical patterns

Markdown twin pages

For every HTML page, expose /page.md with a clean Markdown rendering. Agents prefer it; you get one page, two consumers.

llms.txt at the root

Curate a top-level index; link to your most important pages and Markdown twins. Update when content changes.

Identity headers and abuse policies

Encourage agents to send identifying user-agents; rate-limit by identity; offer sane access tiers.

Server-rendered critical content

Agents that don't execute JS should still be able to read your headlines, prices, and key actions.

Pitfalls to avoid

  • JS-only sites that are invisible to non-rendering agents.
  • No rate limits — bots will find you and your bill will explode.
  • Anti-bot defences that block legitimate AI clients indiscriminately.
  • Failing to test how the top three agents (ChatGPT, Claude, Perplexity) actually see your pages.

Key takeaways

  1. 1Design for two consumers: humans and agents.
  2. 2Adopt llms.txt and Markdown twins; they're cheap and forward-compatible.
  3. 3Rate-limit and authenticate; the agentic web is also a DDoS surface.
  4. 4Test what the agent sees, not just what the user sees.

Go deeper · external resources

Curated reading list to take you from primer to practitioner. All links are external and free to read.

More from Software Engineering