← Writing

28 May 20269 min read

Stress-testing your architecture with AI before you build it

The most expensive bugs are the ones baked into an architecture before a line of code is written. Claude and Gemini are remarkably good at finding them — if you ask the right questions.

ArchitectureClaudeGeminiAI engineeringDesign

The expensive mistake

Architecture bugs are not like implementation bugs. An implementation bug takes hours to find and minutes to fix. An architecture bug takes weeks to find — usually in production — and months to fix, because the fix requires restructuring everything built on top of the wrong foundation.

The standard protection against architecture bugs is design review: an experienced engineer reads the design and asks hard questions. This works when experienced reviewers are available, engaged, and not under the same deadline pressure as the person proposing the design.

Often they are not.

AI-assisted architecture review is not a replacement for experienced human judgment. It is a way to run your design through a rigorous adversarial review before you spend time building it — available on demand, without the scheduling overhead.

The framing that works

The quality of an AI architecture review depends entirely on how the question is framed. "Review my architecture" produces a generic checklist response. Specific adversarial questions produce specific, useful findings.

The framing I use:

"Here is my proposed architecture. Your job is to find the ways it fails — specifically the failure modes that would be expensive to fix after the system is built. Do not evaluate the parts that are clearly fine. Focus on the assumptions I am making that could be wrong, the load or failure scenarios I have not accounted for, and the places where this design paints me into a corner."

This framing produces a different kind of output. It instructs the model to be adversarial, not balanced. You will get the positives elsewhere. What you need from a review is the things that can hurt you.

What to feed the model

The more complete the context, the more specific the findings. For a useful architecture review, include:

The specific areas of uncertainty are particularly valuable. Telling the model "I am least confident about the consistency model for this cache layer" gets you a focused analysis of that layer rather than a survey of the entire design.

Using Gemini for large design documents

When the design involves multiple documents — an original specification, a set of related RFC proposals, existing architecture diagrams, API contracts — Gemini's long context window is the right tool.

Load everything into a single session. The analysis of how a proposed change interacts with an existing constraint requires holding both simultaneously. A model that has only seen a portion of the design will miss cross-document conflicts that are only visible when the full picture is present.

I use this specifically for: refactor proposals on large codebases (the proposed change versus all the existing architecture docs), integration designs (the integration spec versus all the documentation for both systems being integrated), and multi-service designs (the overall architecture versus the independent specifications of each service).

The questions that find the most bugs

Over many architecture reviews, these questions produce the highest signal:

"What are the five things that could cause this system to fail silently — where it appears to be working but is producing wrong results?" Silent failures are more dangerous than loud ones. Engineers tend to design for the failure modes they can observe.

"Where does this architecture assume that two things will always be consistent, and under what conditions might they diverge?" Consistency assumptions are the most common source of architecture bugs in distributed systems.

"At 10x the assumed load, what breaks first?" Load assumptions are often wrong. Understanding the first failure point lets you design the right mitigation.

"If I needed to replace any single component of this system with a different implementation in twelve months, which replacement would be most expensive? Why?" This surfaces tight coupling that is not visible in the current design.

"What data transformation happens in this system that is lossy or irreversible? What happens if that step produces a wrong result?" Irreversible transformations are the hardest bugs to recover from in production.

Iterating on the findings

A good architecture review produces findings that you need to resolve before building — either by changing the design or by explicitly accepting the risk with mitigation.

For each finding from Claude:

  1. Determine whether it reflects a real risk or a misunderstanding of your context
  2. If real: determine whether to change the design or accept the risk
  3. If accepted: document why the risk is acceptable and what the mitigation is

This creates a record of the design decisions that is more valuable than the design document itself — it shows not just what was decided but what was considered and why.

Run a second review pass after you have resolved the findings. New designs sometimes introduce new problems. The second pass is usually faster; the first pass surfaces most of the significant issues.

The complement to human review

Nothing in this process replaces a human architecture review with someone who knows the domain deeply. The AI review is a pre-filter.

By the time a human reviews the design, the obvious failure modes have been identified and addressed. The human reviewer can spend their time on the subtle issues that require domain expertise and judgment — the things a model cannot evaluate as well.

Engineers who use AI architecture review before human review consistently get more from the human review. The reviewer is not wasting time on things that should have been caught earlier. They are focused on the hard problems.

The design that emerges from this process — AI adversarial review, revision, human review — is more robust than one that goes through either in isolation.

The honest limitation

AI architecture review finds what it can reason about from the information you provide. It cannot find risks that depend on information you did not include, failure modes that require production operational knowledge, or problems that can only be seen by someone who has built a dozen similar systems.

It is a powerful and available first pass. It is not a substitute for the judgment that comes from experience. Use it accordingly — as the review you always run, before the review that requires booking time with an expert.

← All writingWork with me →