AI Engineer Melbourne
Knowledge Base
Software EngineeringAdvanced 12 min

The AI Control Plane: Infrastructure as Data

When agents act on your infrastructure, your infra becomes the context window.

Introduction

We've spent a decade codifying infrastructure (Terraform, Pulumi, CDK). The next step is treating infrastructure as a queryable data layer: cloud state exposed as facts, agents armed with reusable operational knowledge, and LLM gateways routing and governing model access. Policy-as-code, observability, and identity become first-class citizens of the agent loop โ€” and the result is an "AI control plane" where agents act on infrastructure safely.

Why this matters

  • IaC alone tells you what should be there; control planes tell you what is.
  • Agents need queryable infrastructure state to reason about cost, security, and capacity.
  • Identity and policy must be first-class to prevent accidental agent misuse of privileged tools.
  • The control plane is the only sane place to enforce per-agent quotas and audits.

Core concepts

1

Infrastructure as a data layer

Cloud state โ€” accounts, resources, configs, policies โ€” exposed as a queryable, unified data model rather than a pile of provider-specific APIs.

2

Skills and reusable runbooks

Operational knowledge codified as skills: "diagnose pod CrashLoopBackOff," "rotate a leaked credential." Agents pick the right skill, not the right command.

3

LLM gateway

A proxy that routes, authenticates, audits, and rate-limits all model traffic from agents. Single point of policy and observability.

4

Policy as code in the loop

Every agent action runs through a policy engine (OPA, Cedar) before execution. The agent proposes; the policy disposes.

Practical patterns

Read-only first

Phase 1: agents only query infra. Phase 2: agents propose changes via PRs. Phase 3: bounded direct action with policy gates.

Skill registry

Versioned, auditable library of operational tasks; agents must use registered skills, not free-form commands.

Just-in-time credentials

Short-lived credentials minted per-action against per-action policies; no long-lived secrets in agent context.

Observability everywhere

Every infra-touching agent action is logged with input, output, decision rationale, and policy check results.

Pitfalls to avoid

  • Letting agents call provider APIs directly; bypasses your control plane and policy.
  • No diff-and-approve step on changes; agents drift production over weeks.
  • Treating skills as static; they need versioning, deprecation, and updates.
  • Mixing read and write privileges in one agent; widens the blast radius unnecessarily.

Key takeaways

  1. 1Build a unified data model for infra; agents need a queryable substrate.
  2. 2Policy as code is non-negotiable for action.
  3. 3Skills > raw commands; they're what makes agents safe at scale.
  4. 4Phase your rollout: read, propose, act โ€” in that order.

Go deeper ยท external resources

Curated reading list to take you from primer to practitioner. All links are external and free to read.

More from Software Engineering