AI Engineer Melbourne
Knowledge Base
AI EngineeringAdvanced 13 min

Kill the God Agent: Architectural Security for Multi-Agent Systems

Prompt injection isn't a guardrail problem. It's an architecture problem.

Introduction

Your multi-agent system probably has one orchestrator with access to every tool, every database, every API. If that agent gets injected, the entire toolchain is compromised. Guardrails โ€” string filters, classifiers, content scanners โ€” won't save you. Real security comes from architectural patterns that bound what an agent can do regardless of what it's convinced to attempt.

Why this matters

  • Indirect prompt injection (instructions hidden in retrieved data) is increasingly the dominant attack vector.
  • Guardrails are necessary but insufficient โ€” adversaries iterate faster than your filters.
  • Compromise of an over-privileged agent escalates to the entire toolchain.
  • Regulators are starting to ask about this; security is becoming a procurement question.

Core concepts

1

Capability isolation

Agents get only the tools they absolutely need for their role. The "research" agent has no shell, the "execute" agent has no email send. Compromise contains.

2

Runtime least privilege

Permissions are not just design-time configuration; they are evaluated per call against the current task context. A compromised agent can't escalate to capabilities it didn't already have authorisation for.

3

Cryptographic boundaries

Use signed tokens, not prompts, to authorise actions. The agent presents a token; the tool validates it cryptographically. Prompts can be social-engineered; signatures cannot.

4

Trust zones for context

Mark every piece of context with its trust level: system prompt (trusted), user input (semi-trusted), retrieved/tool output (untrusted). Untrusted content cannot issue trusted instructions.

Practical patterns

Privileged orchestrator + untrusted workers

The orchestrator never executes user content; it just routes. Workers execute, with no access to each other or to system tools.

Plan / execute split

A planning agent (sees no untrusted data) issues signed instructions to an execution agent (handles untrusted data, but only follows signed instructions).

Egress firewalls

Outbound network calls are filtered against an allow-list scoped to the current task. Data exfiltration becomes impossible by construction.

Capability tokens

Tools require short-lived, scope-bound capability tokens issued by a trusted authority โ€” not by an agent.

Pitfalls to avoid

  • Believing a "no, you're not allowed to do that" system prompt will hold against motivated injection.
  • Storing API keys in agent context โ€” if the agent leaks, so do the keys.
  • Letting one agent both read external content and call privileged tools.
  • Logging entire prompts including secrets to your observability stack.

Key takeaways

  1. 1No god agents. Privilege follows the task, not the actor.
  2. 2Treat prompts as untrusted; treat tokens as authoritative.
  3. 3Architect for compromise โ€” assume one agent is owned, ensure damage is bounded.
  4. 4Guardrails are a defence-in-depth layer, not the primary defence.

Go deeper ยท external resources

Curated reading list to take you from primer to practitioner. All links are external and free to read.

More from AI Engineering