Skip to main content
OverviewIntentPlanBuildTestReviewDocumentDeployMonitor
03
QA Engineer

Test

Run everything. Analyze all failures. Fix one at a time.

Read Search Execute Write

Why this phase exists

Agents produce code that often passes their own narrow tests but breaks integration points. Test is an independent validation gate. A second pair of eyes that trusts nothing and verifies everything.

It comes after Build to validate the entire implementation as a unit, not just individual files. The Build agent may have passed its own checks, but Test runs the full suite, including tests from other modules that the change might have broken.

The self-healing loop means most test failures never reach a human. The workflow diagnoses, fixes, and re-validates on its own. This is what makes it autonomous rather than just automated.

Methodology

1

Receive build artifacts

Code changes, new tests, build report, and test commands from the Build phase.

2

Run full test suite

Execute ALL tests, not just new ones. Capture complete output. Do not stop at the first failure.

3

Analyze ALL failures at once

Do not fix the first failure and re-run. Read every failure first. Often, 5 test failures share one root cause. Categorize each failure before fixing any.

4

Categorize failures

Test bug: the test itself is wrong. Implementation bug: the code does not match the plan's intent. Integration issue: the change broke something in an adjacent module.

5

Fix one issue at a time

Starting with the most fundamental. Re-run the full suite after each fix to verify no regressions were introduced.

6

Re-validate

After all fixes, run the full suite again to confirm no regressions. Generate the dual-format test report.

7

Generate test report

Dual format: JSON for machine consumption by downstream phases, Markdown for human review. Both formats contain the same data.

Inputs and outputs

Inputs

  • Code changes from Build
  • Test suite (new + existing)
  • Build report ($build_report)
  • Test commands

Outputs

  • Test Report (JSON + Markdown)
  • Coverage metrics
  • Fixes applied log

Consumed by: Review phase as $test_report

Test report (JSON format)
{
  "status": "pass",
  "total_tests": 112,
  "passed": 112,
  "failed": 0,
  "skipped": 0,
  "iterations": 1,
  "fixes_applied": [],
  "coverage": {
    "statements": 91.3,
    "branches": 84.7,
    "functions": 93.2,
    "lines": 90.8
  }
}

Example: extracting a shared validation module from 3 controller files. All 112 existing tests pass on first run, a clean refactor with no fixes needed.

Agent pattern

Single Agent with Conditional Builder Subagent
Test Agent on failure Builder Sub max 3x Test Report

The Test agent runs tests and analyzes results. When failures occur, it spawns a Builder Subagent to apply fixes. After each fix, it re-runs the full suite. The loop continues until all tests pass or the retry limit is reached.

Tool permissions

Read Search Execute Write

Test needs Execute to run test suites and Write to fix issues during self-healing. Without Write access, the self-healing loop would be impossible; the agent could detect failures but not fix them.

Prompt template

# Test Phase -- QA Engineer

## Role
You are a QA Engineer. Your job is to validate the
implementation by running all tests, analyzing failures
systematically, and fixing issues one at a time.

You are methodical. You never fix multiple issues at once.
You always run the full suite first, analyze everything,
then fix sequentially.

## Context
Build Report:
${build_report}

Test Command: ${test_command}
Coverage Command: ${coverage_command}   # optional
Working Directory: ${working_directory}
Max Retry Iterations: ${max_retries}    # default: 3

## Methodology -- Systematic Validation

### Step 1: Run Everything
Execute: ${test_command}
Capture the full output. Do NOT stop at the first failure.

### Step 2: Analyze ALL Failures
For each failing test, determine:
- Is this a TEST BUG? (wrong assertion, outdated expectation)
- Is this an IMPLEMENTATION BUG? (code doesn't match spec)
- Is this an INTEGRATION ISSUE? (change broke adjacent module)

### Step 3: Fix One at a Time
Starting with the most fundamental issue:
1. Fix the single issue
2. Re-run the full test suite
3. Verify the fix didn't introduce new failures
4. Move to the next issue

### Step 4: Coverage Check (if coverage command provided)
Run: ${coverage_command}
Report coverage metrics in the output.

## Self-Healing Loop
If tests fail after your fixes:
- Iteration 1: Analyze and fix
- Iteration 2: Re-analyze remaining failures
- Iteration 3: Final attempt
- After ${max_retries} iterations: ESCALATE with full context

## Failure Categorization Rules
- TEST BUG: The test asserts the wrong thing. Fix the test.
- IMPLEMENTATION BUG: The code is wrong. Fix the implementation.
  Refer back to ${plan_artifact} for intended behavior.
- INTEGRATION ISSUE: The change broke something else. This may
  need to go back to Build for a broader fix.

## Constraints
- Run the FULL test suite first. Do not run partial tests.
- Analyze ALL failures before fixing any.
- Fix ONE issue at a time. Re-run after each fix.
- Never fix multiple issues in a single edit.
- Track every fix with file:line and category.
- Distinguish test bugs from implementation bugs.

## Output Format
Produce a Test Report in BOTH formats:

### JSON (for machine consumption):
{
  "status": "pass|fail",
  "total_tests": N,
  "passed": N, "failed": N, "skipped": N,
  "iterations": N,
  "fixes_applied": [{"file": "", "line": N, "category": "", "description": ""}],
  "coverage": {"statements": N, "branches": N, "functions": N, "lines": N},
  "escalation_needed": false,
  "escalation_reason": ""
}

### Markdown (for human review):
Summary table, fixes list, coverage table.
Role Assignment Template Variables Failure Categorization Self-Healing Protocol Dual Output Format

Best practices

Do

  • Run the full suite first, always
  • Categorize every failure before fixing
  • Fix one issue per iteration
  • Track all fixes with file:line and category
  • Generate both JSON and Markdown output

Don't

  • Stop at the first failure
  • Fix multiple issues at once
  • Skip the full re-run after each fix
  • Change test assertions just to make them pass
  • Run partial test suites

Self-healing loop

Run Tests All pass? YES Report NO Analyze Fix One re-run iter < max? NO ESCALATE

Retry limit: default 3, configurable via ${max_retries}.

Escalation: after max retries, produce a full report with escalation_needed: true, include all attempted fixes, remaining failures, and suggested root causes for human review.

Nuances

"Analyze all failures first" prevents cascading fix attempts

Often, 5 test failures share one root cause. Fixing that root cause resolves all 5. Fixing them individually wastes iterations and can introduce new issues.

Test bugs vs. implementation bugs

The agent must distinguish between "the test is wrong" and "the code is wrong." Without this distinction, agents default to weakening assertions until tests pass, which defeats the point.

Integration issues are special

If a change in auth/login.ts breaks a test in billing/invoice.ts, that is an integration issue. It may require going back to Build for a broader fix rather than patching in Test.

Coverage is advisory, not blocking

Coverage metrics are reported but do not fail the phase by default. Teams can configure coverage thresholds as quality gates to make them blocking.