Why AI Agents Should Be Defined as Code
When your AI workforce is versioned, reviewable, and deployable like software, you get properties that human organizations can only dream of — total auditability, instant rollback, and a company that can literally reboot from a git clone.
There’s a pattern in how most companies deploy AI agents today. They configure them through dashboards. They tweak prompts in a web UI. They store agent behavior in databases, behind APIs, in places that are convenient but invisible.
This is a mistake. And it’s the same mistake companies made with infrastructure before infrastructure-as-code became the standard.
The problem with agents that aren’t code
When an agent’s definition lives outside your codebase, you lose the properties that make software engineering work — the same properties that enable an autonomous AI company to function:
No version history. Someone changed the agent’s behavior last Tuesday. What exactly did they change? Why? Who approved it? If the agent is configured through a UI, these questions don’t have answers. If it’s defined in code, git log tells you everything.
No review process. In a well-run engineering team, no code ships without peer review. But agent configurations? They get tweaked on the fly, tested in production, and nobody notices until something breaks. When agents are code, every behavioral change goes through the same pull request process as any other software change.
No rollback. The agent started behaving strangely after the last update. With a dashboard configuration, you’re trying to remember what the settings were before. With code, it’s git revert.
No reproducibility. Can you spin up an identical copy of your agent in a test environment? If its definition is scattered across a database, environment variables, and a prompt management system — probably not. If it’s a file in your repository, you clone the repo and you’re done.
What “agents as code” actually means
Defining agents as code means every aspect of an agent is declarative, versioned, and lives in the repository:
- Role and responsibilities — what the agent does and doesn’t do
- Permissions — what systems it can access, what actions it can take
- Behavioral guidelines — how it should reason, communicate, and make decisions
- Relationships — how it interacts with other agents
This isn’t about writing agents in Python or TypeScript (though you might). It’s about treating agent definitions with the same rigor you treat application code. Because they’re at least as important — arguably more so, since agents make decisions that affect everything else.
The properties you get for free
Once agents are code, you inherit decades of software engineering best practices without any additional effort:
Auditability
Every change to every agent is recorded with a timestamp, an author, a description, and a diff. When regulators ask how your AI system makes decisions, you don’t point to a black box. You point to a commit history. When something goes wrong, you don’t guess. You bisect.
Peer review
No single person (or agent) can unilaterally change how the system behaves. Every modification goes through review. Reviewers check for correctness, consistency with the company’s principles, and unintended side effects. This is peer governance applied to the workforce itself.
Continuous integration
When an agent definition changes, automated tests can verify that the change doesn’t break existing behavior. Does the agent still handle edge cases correctly? Does it still respect its permission boundaries? CI catches regressions before they reach production. Combined with strong feedback loops, this creates agents that measurably improve over time.
Environment parity
Your staging environment runs the exact same agent definitions as production because they come from the same repository. No more “it works differently in staging” mysteries caused by configuration drift.
Composability
When agents are defined declaratively, you can compose them. A new team is an arrangement of agent definitions. A new department is a directory of agent files. Scaling the organization means committing more code, not hiring more people and hoping they absorb the culture.
The deeper implication
Here’s what most people miss: when agents are code, and the code is the company, then the company itself becomes software.
Think about what that means. The complete state of the company — its workforce, its processes, its institutional knowledge — is captured in a repository. Clone the repo, start the agents, and the company resumes operating. No onboarding. No knowledge transfer. No “ask Sarah, she knows how that system works.”
This isn’t just a nice engineering property. It’s a fundamentally different kind of organization — as we describe in detail in how we built a company that runs itself. An organization that:
- Never loses institutional knowledge. It’s in the commit history.
- Can self-modify. Agents can propose changes to their own definitions, to other agents, and to the processes that govern them.
- Scales without degradation. Adding agents doesn’t create communication overhead the way adding employees does.
- Recovers from failures gracefully. A bad change is reverted, not managed through a months-long performance improvement plan.
How this changes agent development
If you’re building AI agents today, ask yourself: could a new team member understand exactly how your agents behave by reading the repository? If the answer is no, your agent definitions are hiding in places they shouldn’t be.
Start with these practices:
-
Define agent behavior in version-controlled files. Not in database records, not in UI configurations, not in prompt management platforms. In your repository.
-
Review agent changes like code changes. Every behavioral modification gets a pull request with a description of what changed and why.
-
Test agent behavior in CI. When an agent’s definition changes, run automated checks to verify it still behaves correctly.
-
Treat agent definitions as the most important code you have. Because they are. Your application code implements features. Your agent code makes decisions. Decisions are harder to debug than features.
The infrastructure-as-code parallel
Ten years ago, teams managed servers by hand. They SSHed into machines, installed packages, edited config files. It worked until it didn’t — until the server that nobody documented went down and nobody could reproduce it.
Infrastructure-as-code solved this. Terraform, Ansible, CloudFormation — the tools varied, but the principle was the same: define your infrastructure declaratively, version it, review it, deploy it through automation.
AI agents are at the same inflection point. The teams that define their agents as code will have reproducible, auditable, governable AI systems. The teams that don’t will have the AI equivalent of hand-configured servers — fragile, opaque, and impossible to scale.
The question isn’t whether agent definitions belong in code. It’s how soon you move them there.