Your coding agent should not be the only judge of its own work.

A small robot at a mirror sees its reflection robed and wigged as a judge and says, "Wow, I look good!" — an agent approving its own work.
A verification harness for Claude Code

Confidently move faster with AI-generated code

Adversarial TDD is a verification harness for Claude Code. It helps Claude Code surface potential incorrectness when the stakes are high for AI-generated code.

Your coding agent should not be the only judge of its own work.

When should I use it?

Use Adversarial TDD for changes where a hidden assumption could become an expensive failure.

What makes it different?

The harness separates specification, interface design, implementation, tests, and adversarial review into isolated reasoning paths.

What does it actually do?

It creates independent opportunities for wrong assumptions to collide, then escalates unresolved uncertainty to you. It does not guarantee correctness.

Problem

AI made code generation faster. Verification did not keep up.

Coding agents can produce code, tests, and explanations that all agree — and still be wrong.

A grid of green 'PASS' results above the line: Everything agrees. Still wrong.

The problem is not your coding agent. The issue is that AI coding workflows often treat uncertain inputs as settled facts:

When code generation and verification happen inside the same session, a wrong assumption can flow through the pipeline as fact.

Adversarial TDD adds multiple separate paths, so those assumptions have more chances to collide before you trust the result.

Verification adds a little time up front — and that is exactly what speeds up your whole process, because a wrong assumption caught now is not a production failure you chase down later.
Product

An independent verification harness for Claude Code.

You describe the feature in Claude Code.

Adversarial TDD uses Claude Code to run your work through isolated reasoning sessions: specification, interface design, implementation, and tests.

Those sessions are kept separate so they cannot simply inherit the same assumptions. Their outputs are then:

When the reasoning paths disagree, Adversarial TDD treats that disagreement as a signal of potential misinterpretation and brings it back to you for judgement.

Use it when the cost of being wrong is higher than the cost of checking.

Each run produces a summary of what was checked, what disagreed, what changed, and where human judgement was required.

Two engines

Claude Code builds. Adversarial TDD checks.

Adversarial TDD does not replace Claude Code. It complements Claude Code by doing the thing a helpful coding agent should not do by itself: independently challenge the work.

Claude Code explores, adapts, and writes code. Adversarial TDD checks with enforced structure.

Claude Code — the Explorer
Adversarial TDD — the Challenger
Claude CodeHelps you explore, build, refactor, and resolve open-ended problems.
Adversarial TDDRuns a bounded verification pass against the approved work item.
Claude CodeAdapts to your instructions and follows the conversation wherever it needs to go.
Adversarial TDDHolds the workflow to structured verification checkpoints that cannot be negotiated away in chat.
Claude CodeCan smooth over ambiguity to keep making progress.
Adversarial TDDSurfaces ambiguity in code, documentation, or intent when the system cannot safely resolve it.
Claude CodeCan inherit a misinterpretation and produce code, tests, and explanations that all agree.
Adversarial TDDIsolates specification, interface design, implementation, tests, and adversarial review so wrong assumptions have more chances to collide.
Claude CodeOptimises for helpful progress.
Adversarial TDDChallenges artefacts for ambiguity, unsupported assumptions, scope drift, implementation leakage, and disagreement between independently derived outputs.
Claude CodeLets you decide what you want to build.
Adversarial TDDGives you evidence to decide whether the result still contains unresolved uncertainty.
localhost:3987/session
Adversarial TDD run map: each isolated teammate — Planner, Architect, Builder, Inspector — with the artefacts it is allowed to see, the artefacts kept away to preserve independence, and what it produces.
The Challenger's enforced structure: each teammate is given only the artefacts its job needs and kept from the rest — so its conclusions are derived independently, not inherited.

Adversarial TDD complements Claude Code precisely because they have opposite failure modes.

Together, they let you keep Claude Code's speed without letting the same reasoning path become its own verification layer.

Workflow

You build with Claude Code. Adversarial TDD verifies the work.

Adversarial TDD does not write your code. Claude Code remains the coding agent. Adversarial TDD uses Claude Code as the user-facing surface, then runs an independent verification harness around the work.

  1. Reason with Claude Code about what you are building.
  2. When ready, run /atdd-start.
  3. Claude Code starts Adversarial TDD's MCP server.
  4. Adversarial TDD runs the verification harness; Claude writes the code.
  5. You receive completed code, findings, or a divergence that requires your judgement.
You stay in Claude Code throughout. Adversarial TDD runs in the background and brings you back only when a decision is needed.

It works in the background, but the entire run stays observable. Open the observability layer at a localhost URL to watch the session unfold live.

localhost:3987/session
Adversarial TDD observability layer: a live reasoning map with one teammate active, adversarial-challenge traces accruing, and a single prompt opened for inspection while the session runs.
Live at a localhost URL: the reasoning map updates as the session runs — teammates working in isolation, challenge traces accruing, any prompt open to inspect as it happens.
For teams

Learn from the verification data your team creates.

For an individual developer, Adversarial TDD provides an independent check before trusting AI-generated code. For a team, the value compounds: every run creates data about how your team builds, verifies, revises, and accepts AI-generated changes.

See how Adversarial TDD becomes correctness intelligence →

Waitlist

Join the MVP waitlist for Claude Code users.

Early MVP for developers using AI-generated code on high-consequence changes.

Where do you most often see AI-generated code fail after passing tests?
How do you currently try to catch or reduce these issues?

We'll email you when your spot opens, along with the occasional build update.