The Brief
The compliance automation market is dominated by Vanta and Drata — products that charge $30–50K per year to help companies pass SOC 2. Both are UI-heavy, integration-heavy, and slow to onboard. The actual compliance work — evidence collection, control mapping, gap analysis — is formulaic enough to be largely automated with modern LLMs.
Three weeks was the target. It shipped in three weeks.
What Was Delivered
- SOC 2 Type I/II, ISO 27001, HIPAA, PCI-DSS, GDPR, CCPA, NIST CSF, FedRAMP (partial), CIS Controls, SLSA — eleven frameworks in production
- 120+ integrations: AWS, GCP, Azure, GitHub, Jira, Okta, Slack, and more
- Multi-agent architecture: one routing agent, eleven specialist framework agents
- Evidence normalization layer mapping every integration's output to control requirements
- Gap analysis report generator and remediation guidance system
- Stripe Checkout with usage-based pricing, wired in on day sixteen
- Paying customers; competitive demos run against Drata and Vanta
The Approach
Rather than building a monolithic system, TraceLayer is a network of specialized agents. Each compliance framework gets its own specialist agent trained on that framework's specific control requirements, evidence standards, and audit expectations. A routing agent handles user queries and delegates to the appropriate specialist.
The specialization matters: SOC 2 Trust Service Criteria and ISO 27001 Annex A controls use different language and different evidence formats. A single general-purpose agent produces mediocre answers for both. Specialists produce expert answers for their domain.
"The architecture that won was not the most sophisticated. It was the most appropriate."
The stack was chosen for speed and operational simplicity:
- FastAPI for the backend — Python is the natural home for LLM orchestration, and FastAPI's async support makes multi-agent coordination tractable
- Supabase for storage — row-level security handles multi-tenancy without custom middleware; the integration catalog lives in Postgres, evidence files in Supabase Storage
- Groq / Llama 3.3 for inference — 70B parameter models on Groq's hardware are fast enough for interactive use and precise enough for compliance analysis
- Next.js for the frontend — App Router, server components, and incremental rendering make the evidence dashboard feel native
The Build
Week 1 — Architecture. The agent graph was designed, the framework knowledge bases structured, and the integration API surface defined. No code was written until the design was stable. The foundational decision was the evidence normalization schema: every integration produces different data formats, and they all need to map cleanly to the same control requirements.
Week 2 — Integration layer and core agents. 120+ integrations were normalized behind a common evidence-collection interface. Each framework agent was given structured access to this interface plus its domain knowledge base.
Week 3 — Product layer. Dashboard, gap analysis report generator, remediation guidance, billing. Stripe wired in on day sixteen.
class EvidenceNormalizer:
def normalize(self, source: str, raw: dict) -> Evidence:
handler = self.handlers.get(source)
if not handler:
raise UnsupportedSourceError(source)
return handler.normalize(raw)
The framework agent receives normalized evidence and applies its domain knowledge to determine whether the evidence satisfies the relevant control requirements — and if not, what exactly is missing and how to remediate it.
The Outcome
TraceLayer is live at tracelayer.it.com. It has paying customers. Competitive demos against Drata and Vanta have resulted in teams choosing TraceLayer for the demo quality, integration breadth, and price.
The build validated the core thesis: an agentic development workflow using Claude Code can compress months of product work into weeks without sacrificing quality. The same methodology informs every client build engagement I take on.
Lessons
The hardest part was not the LLM orchestration — it was the compliance domain knowledge. Getting the agent prompts precise enough to produce expert answers, yet flexible enough to handle edge cases, took as long as the technical integration work.
In domain-specific AI products, the domain work is the product work. The LLM is infrastructure.
Multi-tenancy at the data layer is easier to get right at the start than to retrofit. Supabase's RLS policies were set up correctly from day one. That decision has never caused a problem.