Example 01
How test-passing errors arise
Tests pass.
The code is still wrong.
"What did I miss?"
Stop using code-verification approaches that waste your time and still let errors get through undetected.
They are a structural property of AI-driven code development.
![]()
...the models make wrong assumptions on your behalf and just run along with them without checking...
1.6k 6.8k 39k 35kView on X →
Example 01
How test-passing errors arise
Tests pass.
The code is still wrong.
"What did I miss?"
Even with better models, ambiguity in the specification and shared reasoning remain.
So more code-verification effort, within the way you're verifying code today, won't change your outcome.
This is already happening in production workflows.
Based on user calls and real-world usage, as one engineer on the Claude Code team observed:
![]()
done about 10 of these [user conference] calls so far+ looked at more transcripts many learnings but one of the biggest is that it's very easy to spend a lot of tokens on open ended verification that doesn't make your output better...
114 34 1k 308View on X →
When verification cannot establish correctness, teams compensate by doing more of it:
Token usage increases.
Costs become unpredictable.
The burden shifts back onto you.
Reviewing outputs. Tracing logic. Trying to establish correctness by hand.
The outcome does not improve.
Correctness cannot be established within a shared reasoning path.
It must be enforced through:
Correctness requires independent reasoning.
Independent reasoning requires separation.
The reasoning used to generate the code must not be the same reasoning used to validate it.
Based on what the Claude Code team is seeing in their own workflows, as Boris Cherney, the creator of Claude Code, put it:
![]()
...what helps [code quality problems] is also having the model code review its code using a fresh context window...
168 529 7k 4.7kView on X →
It structures the interaction so that reasoning paths are independent and can challenge each other:
This is enforced in code - a harness that orchestrates the workflow, not relying on prompts alone.
As a result, when reasoning paths disagree, the system does not resolve the conflict — it exposes it:
/atdd-start slash command.You don't need to do anything.
You receive the completed code and a session summary in Claude Code.
Instead of test-passing errors silently propagating through the pipeline, failure becomes a signal.
Example 02
How we prevent test-passing errors
Same ambiguity.
No test-passing error reaches production.
"Nothing broke."
Adversarial TDD mirrors the independent derivation principle used in safety-critical software engineering: correctness is not guaranteed, but multiple independent reasoning paths increase the chance that incorrect assumptions surface early and trigger escalation.
Adversarial TDD is currently in development.
We're speaking with engineers working on systems where correctness matters.
If this matches your workflow, you can request early access.