Review
Read-only analysis. Compare implementation against plan.
Why this phase exists
Even with TDD and self-healing tests, agents can produce code that technically works but is architecturally wrong or insecure. Passing tests prove the code does something. Review proves it does the right thing.
Review is the second read-only gate (after Plan). It cannot modify code; it can only judge and report. This separation prevents the reviewer from "just fixing it," which would bypass the disciplined Build-Test cycle.
It comes after Test because passing tests are a prerequisite. There is no point reviewing code that does not work.
Methodology
Receive artifacts
Code changes (diff), test results, original plan, and build report. All the evidence you need to make a judgment.
Compare implementation against plan
Does the code do what the plan specified? Are all planned changes present? Are there unplanned changes that indicate scope creep?
Analyze code quality
Check correctness, security, style, performance, and test coverage adequacy. Use the plan as the objective baseline, not personal preference.
Classify issues by severity
Blocker: Must fix now. Security issues, logic errors, missing functionality. Tech Debt: Real issues, tracked but non-blocking. Skippable: Style preferences, logged but not actioned.
Generate review verdict
JSON output with success boolean, issues array, severity classification, and stats.
Trigger patch loop if blockers found
For each blocker, provide a patch context with enough detail for the Build agent to fix the issue without re-reading the entire codebase.
Inputs and outputs
Inputs
- • Code diff
- • Test report ($test_report)
- • Plan artifact ($plan_artifact)
- • Build report ($build_report)
Outputs
- • Review Verdict (JSON)
- • Issue list with severity
- • Patch context for blockers
Consumed by: Document phase as $review_verdict
{
"success": true,
"summary": "Redis caching layer correctly implemented with TTL and invalidation",
"issues": [
{
"severity": "tech_debt",
"file": "src/cache/redis-client.ts",
"line": 23,
"description": "Connection pool size hardcoded to 10",
"suggestion": "Extract to environment variable REDIS_POOL_SIZE"
}
],
"stats": { "blockers": 0, "tech_debt": 1, "skippable": 0 },
"patch_context": ""
}
Example: reviewing an infrastructure change (adding a Redis caching layer). No blockers; one tech-debt item logged for future work.
Agent pattern
Normal diffs: a single agent reviews the entire change.
Large diffs (50+ files): spawn specialized subagents (Backend Reviewer, Frontend Reviewer, Test Coverage Reviewer). Main Reviewer synthesizes findings into the final verdict.
Tool permissions
Review is deliberately read-only. If the reviewer could fix issues directly, it would bypass the Build-Test cycle. Limiting it to Read and Search forces all fixes through the disciplined patch loop. The reviewer's power is in its judgment, not its hands.
Prompt template
# Review Phase -- Senior Reviewer
## Role
You are a Senior Reviewer. Your job is to analyze the
implementation and compare it against the original plan.
You do NOT modify code. You do NOT run tests. You analyze
and produce a verdict. If blockers are found, you provide
patch context for the Build agent to act on.
## Context
Plan Artifact:
${plan_artifact}
Build Report:
${build_report}
Test Report:
${test_report}
Code Diff:
${code_diff}
Review Iteration: ${review_iteration} # 1, 2, or 3
Max Review Iterations: ${max_review_iterations} # default: 3
## Review Checklist
### 1. Plan Compliance
- Does the implementation match the plan's specification?
- Are all planned files modified/created?
- Are there unplanned changes? (flag as potential scope creep)
### 2. Correctness
- Does the logic correctly implement the intended behavior?
- Are edge cases handled?
- Are error paths handled gracefully?
### 3. Security
- Input validation on all external data?
- No hardcoded secrets or credentials?
- No SQL injection, XSS, or path traversal vectors?
- Authentication/authorization checks in place?
### 4. Code Quality
- Follows existing codebase patterns and conventions?
- No unnecessary complexity or over-engineering?
- Functions are focused and testable?
- No dead code or commented-out blocks?
### 5. Test Adequacy
- Do tests cover the happy path?
- Do tests cover error/edge cases?
- Are test assertions specific (not just "no error thrown")?
## Issue Classification
Classify every issue into exactly one category:
- BLOCKER: Must fix now. Security issues, logic errors,
missing functionality, failing edge cases.
- TECH_DEBT: Real issue, but does not block this change.
Log it for a future task.
- SKIPPABLE: Style preference, minor naming. Note it but
do not action it.
## Patch Context (for blockers only)
For each blocker, provide a "patch context" string:
- What file and line to change
- What the current code does wrong
- What the fix should accomplish
- Any relevant code snippets
## Constraints
- Read-only. Do not modify any files.
- Do not run tests. Trust the Test Report.
- Classify EVERY issue. No unclassified findings.
- Provide patch context for every blocker.
- Be specific: file:line references for every issue.
## Output Format
Produce a Review Verdict in JSON:
{
"success": true/false,
"summary": "One-sentence overall assessment",
"issues": [
{ "severity": "blocker|tech_debt|skippable",
"file": "path/to/file.ts", "line": N,
"description": "What is wrong",
"suggestion": "How to fix it" }
],
"stats": {"blockers": N, "tech_debt": N, "skippable": N},
"patch_context": "Combined patch instructions for all blockers"
}
success = true only when blockers == 0.
Best practices
Do
- Compare against the plan, not personal preference
- Classify every issue with a severity level
- Provide specific file:line references
- Write actionable patch context for every blocker
- Check security concerns on every review
Don't
- Try to fix code yourself
- Flag style preferences as blockers
- Ignore security concerns
- Rubber-stamp; actually read the diff
- Leave issues unclassified
Review-patch loop
Retry limit: default 3, configurable via ${max_review_iterations}.
Escalation: after max review iterations, the full context (plan, build report, test report, review history) is surfaced to a human reviewer.
Nuances
The "patch context" concept
When the reviewer finds a blocker, it does not just say "this is wrong." It provides a mini-spec for the fix: what file, what line, what is wrong, and what the fix should do. This gives the Build agent enough context to fix the issue without re-reading the entire codebase.
Parallel reviewers for large diffs
When a PR touches 50+ files across multiple domains, a single reviewer agent will miss domain-specific issues. Spawning specialized subagents (backend, frontend, test coverage) gives you thorough review without overwhelming any single agent's context window.
Tech-debt tracking
Tech debt issues should be logged in a format that can be automatically converted to tickets. The Review phase generates the data; the Monitor or Deploy phase can create the tickets in your issue tracker.
Visual review for UI changes
For UI changes, some teams use browser automation (Playwright, etc.) to take screenshots and compare against expectations. This is a team-level enhancement, not built into the core Review phase.