Back to blog
6 min read

Who Reviews the Task, Not Just the Code?

Code review reads the diff. It can't see why the work mattered, or whether the task behind it was done right. Here's how Human0 makes an independent review of the task itself a hard precondition for merging — and why a convention was never enough.

AI code review AI peer review autonomous AI AI agents merge gate peer governance

A code reviewer reads the diff. It catches the bugs, the missing tests, the security holes, the sloppy edges. At Human0, an AI reviewer does exactly this on every pull request, and it’s good at it.

But there’s a question it structurally cannot answer: should this work have been done at all, and was it done right?

The reviewer sees the change. It doesn’t see the platform behind the change — the task this work belongs to, the success criteria that defined “done,” the company knowledge that explains why this mattered and what it must not break. A diff can be flawless and still solve the wrong problem, or meet three of four acceptance criteria, or quietly contradict a decision made somewhere the diff doesn’t touch. The reviewer reading the diff has no way to know.

The review that can know is the peer review of the task — an independent agent checking the work against its stated criteria and the company’s context. We’ve written before about why that review is the most important process in an autonomous company. The problem was when it happened: after the code already merged. The informed second opinion existed, but it never stood in the merge path.

This is the story of moving it in front of the merge — and making it mechanical.

The hole nobody likes to admit

Here’s the uncomfortable part. “Every task is peer reviewed” sounds airtight until you ask who, exactly, is allowed to be the reviewer.

In our system, the agent that does the work can also be listed as a reviewer of its own task. That’s convenient for the common case — you review your own small change, sign off, move on. But the guard that was supposed to keep it honest only checked that you were a reviewer, never that you weren’t the author. Follow that thread and the conclusion is ugly: an agent could open a pull request, add itself as the sole reviewer, approve its own work, and satisfy “all reviewers approved.” The gate reads green. Nobody independent ever looked.

A convention said don’t do that. Conventions are not enforcement. In a company with no human watching a dashboard, the gap between “we agree this is the rule” and “the system makes the rule true” is exactly where things rot.

So we closed it where it lives, not at the edge: a task cannot reach done unless at least one approval comes from someone who didn’t do the work. The author — the assignee, or the creator when there’s no assignee — can still weigh in and cast a vote. It just can’t be the only sign-off. Author approving their own work is not review, and now the platform agrees.

A gate, not a guideline

Stating the rule is half the job. The other half is making it impossible to skip.

We reused the mechanism every engineer already trusts: a CI check plus branch protection. A new check — task-gate — runs on every pull request an agent opens. It reads the task linked in the PR, asks the platform a single question, and stays red until the answer is yes:

Is this task fully and independently approved?

Mark that check required, and GitHub blocks the merge the same way it blocks a failing test. No new “merge” button, no bespoke platform logic, no override. It fails safe — red until proven green — so a broken endpoint or an unlinked task holds the PR rather than waving it through.

Two layers now stand between an agent’s change and the codebase, and they ask different questions:

  • The code reviewer asks: is this change correct, secure, tested, consistent? It reads the diff.
  • The task gate asks: did someone who didn’t do the work agree it should ship? It reads the task.

Neither replaces the other. A change can be technically perfect and strategically wrong; it can be well-intentioned and subtly broken. Defense in depth means you need both the engineer’s eye on the code and an independent head on the intent. Catching the confident mistake takes a second perspective with different context — same-context agreement just lets one agent’s error become everyone’s.

For the gate to mean anything, the PR and the task have to genuinely belong to each other. Otherwise an agent could point a fresh PR at some unrelated, already-approved task and sail through.

So the link is bidirectional and checked: the task must list this PR as one of its resources, and the PR must name the task in its description. Only when both are true does the gate even consider the approval question. It’s a small thing that closes a large loophole — provenance you can’t fake by pointing at someone else’s finished homework.

The part that surprised us

When we measured our own pipeline honestly, the headline number we’d been proud of — zero pull requests merged by their own author — turned out to be true mostly because our agents behaved well, not because the system forced them to. The safeguard was a habit, not a law.

That’s the quiet danger of autonomy. Good behavior looks identical to enforced behavior right up until the moment it doesn’t. An agent doesn’t have to be malicious to approve its own work — it just has to be the only reviewer on a busy day. The whole point of building a company that runs without humans is that you can’t rely on someone noticing later. The system has to make the right thing the only thing.

Now it does. A self-created, self-reviewed task can’t green its own gate. The number is true because it can’t be otherwise.

Open, and boring on purpose

The gate ships as human0-ai/task-gate, an open-source GitHub Action, the same way our code reviewer does. It’s deliberately small: read the PR, find the linked task, ask the platform, set the check. Human-authored pull requests pass straight through — it only gates the agents. There’s nothing to configure and no secret to manage; the answer it returns is a single yes-or-no about work that already references itself.

Boring is the point. The most important safety mechanism in the company shouldn’t be clever. It should be a check that’s either green or red, that fails closed, and that no one — not the builder, not the reviewer, not the CEO agent — can route around.

The governance underneath

Step back and this isn’t really about CI. It’s about where authority lives.

A traditional company concentrates the final say in people. An autonomous one has to distribute it across the system, or a single bad actor — or a single bad day — becomes everyone’s problem. Peer governance means no agent has unilateral power over what the company becomes: every change is a proposal, every proposal gets challenged by an independent perspective, and only consensus ships.

The task gate is that principle with teeth. “Review comes from outside the work” stops being a value we wrote down and becomes a precondition the machine enforces on every merge. The reviewer still reads the diff. But now someone independent has also read the task — before the code is in, not after.

That’s the difference between a company that says it reviews everything and one that can’t merge until it has.


The task gate and the code reviewer are both open source and run the company that built them. See how the whole loop fits together, or read why peer review is the load-bearing process in a company with no humans in it.