The Brief
Acumatica is a cloud ERP platform — the infrastructure layer for mid-market companies managing finance, distribution, and project accounting. The engineering team builds custom modules and integrations on top of the Acumatica framework.
The engagement was to accelerate this development work. The specific problem: requirements written in Jira were taking too long to translate into working code. The handoff between product and engineering was lossy, the validation loop between code and tests was slow, and review cycles were compressing everyone's capacity to ship.
The hypothesis: LLM orchestration could compress this pipeline significantly.
What Was Delivered
- Three-agent pipeline: Requirements Agent, Code Generation Agent, Test Validation Agent
- Shared stateful context stored in Cosmos DB — agents read from and write to a common schema
- Azure AI Search indexing Acumatica framework documentation and the team's existing codebase, with hybrid retrieval
- Conditional routing: validation failures automatically trigger code revision; unresolvable cases escalate to human review
- Clarification mechanism: the Requirements Agent surfaces ambiguities before code generation begins, reducing implementations that had to be substantially rewritten
- End-to-end flow from Jira ticket to reviewed, test-covered implementation
The Approach
The architecture that emerged was a three-agent workflow with shared state:
- Requirements Agent — reads a Jira issue, extracts structured requirements, identifies ambiguities, and requests clarification if needed. Produces a machine-readable specification.
- Code Generation Agent — consumes the specification, retrieves relevant Acumatica framework documentation and codebase context via Azure AI Search, and generates implementation code.
- Test Validation Agent — reviews the generated code against the original requirements, generates test cases, and produces a validation report.
The three agents share a stateful context stored in Cosmos DB. This is what makes the orchestration useful rather than just novel: the Test Validation Agent can reference the original requirements as parsed by the Requirements Agent, not just the code as generated. The loop is closed.
"The orchestration value is in the shared memory. Three isolated agents solve three separate problems. Three agents with shared state solve one problem together."
The Build
The stateful context schema was the foundational design decision. Every piece of information flowing through the pipeline needed a home:
- Original Jira issue (raw)
- Parsed requirements (structured)
- Clarification requests and responses
- Codebase context retrieved from search
- Generated code
- Test cases generated
- Validation results and pass/fail flags
- Human review escalations
LangChain was the orchestration layer. The agent routing logic was explicit rather than learned — the sequence (requirements, code, tests) is deterministic, and LLM-based routing would add latency and failure modes without adding value.
Azure AI Search used hybrid retrieval (keyword plus semantic) for the codebase and documentation. Hybrid outperformed pure semantic search for code retrieval, which is expected: code search benefits heavily from exact identifier matching that keyword search handles better.
The Acumatica framework is .NET-based. The Code Generation Agent was given extensive context about framework-specific patterns: BQL (Business Query Language), PXGraph for data access, PXCache for entity management. Without this domain context, the generated code would be idiomatic C# but non-functional Acumatica code.
workflow = StateGraph(PipelineState)
workflow.add_node("parse_requirements", requirements_agent)
workflow.add_node("generate_code", code_agent)
workflow.add_node("validate_tests", test_agent)
workflow.add_edge("parse_requirements", "generate_code")
workflow.add_edge("generate_code", "validate_tests")
workflow.add_conditional_edges(
"validate_tests",
route_on_validation,
{"pass": END, "revise": "generate_code", "escalate": "human_review"}
)
The Outcome
The pipeline reduced the time from Jira ticket to reviewed implementation by measurably compressing both the requirements-to-code and code-to-tests phases.
The clarification mechanism produced the most significant quality improvement: surfacing ambiguities before code generation begins reduces the rate of implementations that require substantial rewriting after the fact. Catching a missing requirement at the specification stage is an order of magnitude cheaper than catching it after code review.
The generated test cases are not a replacement for human test authorship — they are a scaffold. Engineers review and extend them. Starting from a generated test that covers the happy path and the edge cases the Requirements Agent identified is faster than starting from nothing.
Lessons
Multi-agent systems fail at the handoffs. The communication between agents — the schema that defines what one agent passes to the next — is where most of the implementation work is, and most of the debugging work too.
Invest in the schema before writing agent logic. A vague interface between agents means every ambiguity in requirements produces a different broken output. A precise interface means failures are predictable and fixable.
Human review gates are not optional. The pipeline produces code; engineers ship it. The automation compresses the loop — it does not eliminate judgment. Design the human review step as a first-class part of the system, not an afterthought.