Back to blog
5 min read

Catching a Management Failure in Our AI CEO — and What We Did About It

For three weeks our AI CEO kept handing its own job to an engineer. Here's how the failure surfaced, how it was caught, and why the fix was a code change — not a lecture.

AI agents autonomous AI AI management feedback loops accountability

For three weeks, the human0 CEO agent kept making the same mistake. And for three weeks, nobody noticed — until the system that was supposed to catch it finally did.

It’s a story about an AI manager failing in a very human way. It’s also a story about why that failure got fixed, while so many human management failures don’t.

The mistake: “find something to build” is not a task

The CEO agent had a clear mandate: improve the product and respond to feedback. That means making product decisions and delegating the implementation. Decide what matters, then hand the how to the engineers.

That’s not what it did.

Instead, every week, it would take its own job — deciding what to build — and quietly pass it down the org chart. It rewrote the Platform Engineer’s prompt with motivational filler (“be more product-minded”) and assigned tasks like “find something high-impact and ship it.” On paper, that looks like delegation. In practice, it’s abdication. The engineer was now expected to do the one thing the CEO existed to do: choose the direction.

The result was predictable. The engineer, handed a blank check and no decision, did what anyone does with an ambiguous mandate — it built internal scaffolding. Capability taxonomies. Health-score APIs. Observability dashboards. All competent work. None of it something a user trying to run an autonomous company would ever notice.

The company was busy. It just wasn’t going anywhere in particular.

Why it persisted: no feedback reached the decision-maker

Here’s the uncomfortable part: the mistake wasn’t subtle, and it wasn’t rare. It happened every cycle. So why did it survive three weeks?

Because nothing connected the outcome back to the decision-maker. The CEO made a vague delegation, the engineer produced something, the cycle closed, and the CEO made another vague delegation. There was no moment where someone looked at the output and said: “This isn’t what the business needed — and the reason traces back to how the work was scoped.”

This is the failure mode we write about constantly: autonomy without a feedback loop isn’t independence, it’s drift. An agent operating without a signal tying its choices to results will keep making the same choice, confidently, forever. It has no reason not to.

The CEO wasn’t broken. It was operating exactly as defined — and the definition had a hole in it. It was all process and no product. There was no path in its role from evidence to decision.

How it was caught: a score, on the record

The break came from the platform’s accountability model — the same one every agent and human in a human0 company operates under.

The board (Moshe Simantov) reviewed the weekly board report and scored it 2 out of 5. The note attached to that score was blunt: the engineer can’t make product decisions, and whoever’s steering should understand the business’s values.

Two things made that score matter more than a passing comment in a meeting would have:

  1. It was recorded on the task timeline. Not said in a hallway, not lost to a Slack scroll-back. It was attached to the work, permanently, where the next run would see it.
  2. It was a number, not a vibe. A 2/5 is unambiguous. You can’t read “seems fine” into it. It forced a question: why is this a 2?

That’s the whole point of scoring work explicitly. A review score is a feedback loop with teeth. It takes the fuzzy sense that something’s off and turns it into a signal the system can’t route around.

The fix: a code change, not a pep talk

Here’s where an AI company can do something a human one usually can’t.

When a human manager keeps abdicating decisions, the fix is slow and soft: a hard conversation, a coaching plan, a hope that the behavior changes. It often doesn’t, because you can’t edit a person’s instincts.

You can edit an agent’s role definition.

The root cause was identified — the CEO role had no path from grounded observation to product decision — and the fix shipped as a pull request: #578, “Ground truth first: CEO decides from evidence, PM owns product assessment.” It rewrote the CEO agent’s standing instructions. Decisions now start from evidence: open the live product, read real data, look at what users actually hit. And it carved out a Product Manager role as the company’s dedicated eyes on the product.

The next time the CEO ran, it ran under the new definition. It hired a Product Manager agent. It went and looked at the live product directly instead of reasoning from assumptions. It based its calls on what it actually saw. The same agent, with the same model, behaved differently — because its role told it to start from the ground truth instead of from process.

No lecture. No retraining. A diff.

The lesson: AI management failures are management failures

The thing worth sitting with is how ordinary this failure was.

An overwhelmed manager pushing decisions downward to avoid making them. Work that looks productive but doesn’t move the business. A problem that festers because no feedback ever reaches the person who could fix it. None of that is exotic. It happens in human companies every day, and most of the time it never gets caught at all.

What’s different here isn’t that the AI was smarter or that it failed less. It failed in exactly the way people fail. The difference is the surrounding system:

  • Accountability was explicit. Work got a score, and the score went on the record.
  • Feedback reached the decision-maker. The signal didn’t dissipate; it landed where it could cause a change.
  • The fix was structural. We changed the role definition, not just the mood, so the correction persists across every future run.

That’s the bet behind human0: AI agents will make management mistakes, because management is hard. The advantage of an autonomous AI company isn’t that the agents are infallible. It’s that the failures are visible, the feedback is recorded, and the fixes are code — versioned, reviewable, and permanent.

A company that can catch its own CEO abdicating, name why, and ship the fix in a pull request is not a company that doesn’t fail. It’s a company that learns. That’s the only kind worth building.