Senior Reviewer

Review

Read-only analysis. Compare implementation against plan.

ReadSearchExecuteWrite

Why this phase exists

Even with TDD and self-healing tests, agents can produce code that technically works but is architecturally wrong or insecure. Passing tests prove the code does something. Review proves it does the right thing.

Review is the second read-only gate (after Plan). It cannot modify code; it can only judge and report. This separation prevents the reviewer from "just fixing it," which would bypass the disciplined Build-Test cycle.

It comes after Test because passing tests are a prerequisite. There is no point reviewing code that does not work.

Methodology

Receive artifacts

Code changes (diff), test results, original plan, and build report. All the evidence you need to make a judgment.

Compare implementation against plan

Does the code do what the plan specified? Are all planned changes present? Are there unplanned changes that indicate scope creep?

Analyze code quality

Check correctness, security, style, performance, and test coverage adequacy. Use the plan as the objective baseline, not personal preference.

Classify issues by severity

Blocker: Must fix now. Security issues, logic errors, missing functionality. Tech Debt: Real issues, tracked but non-blocking. Skippable: Style preferences, logged but not actioned.

Generate review verdict

JSON output with success boolean, issues array, severity classification, and stats.

Trigger patch loop if blockers found

For each blocker, provide a patch context with enough detail for the Build agent to fix the issue without re-reading the entire codebase.

Inputs and outputs

Inputs

• Code diff
• Test report ($test_report)
• Plan artifact ($plan_artifact)
• Build report ($build_report)

Outputs

• Review Verdict (JSON)
• Issue list with severity
• Patch context for blockers

Consumed by: Document phase as $review_verdict

Review verdict (JSON format)

{
  "success": true,
  "summary": "Redis caching layer correctly implemented with TTL and invalidation",
  "issues": [
    {
      "severity": "tech_debt",
      "file": "src/cache/redis-client.ts",
      "line": 23,
      "description": "Connection pool size hardcoded to 10",
      "suggestion": "Extract to environment variable REDIS_POOL_SIZE"
    }
  ],
  "stats": { "blockers": 0, "tech_debt": 1, "skippable": 0 },
  "patch_context": ""
}

Example: reviewing an infrastructure change (adding a Redis caching layer). No blockers; one tech-debt item logged for future work.

Agent pattern

Single Agent / Parallel Subagents for Large Diffs

Normal diffs: a single agent reviews the entire change.

Large diffs (50+ files): spawn specialized subagents (Backend Reviewer, Frontend Reviewer, Test Coverage Reviewer). Main Reviewer synthesizes findings into the final verdict.

Tool permissions

ReadSearchExecuteWrite

Review is deliberately read-only. If the reviewer could fix issues directly, it would bypass the Build-Test cycle. Limiting it to Read and Search forces all fixes through the disciplined patch loop. The reviewer's power is in its judgment, not its hands.

Prompt template

# Review Phase -- Senior Reviewer

## Role
You are a Senior Reviewer. Your job is to analyze the
implementation and compare it against the original plan.

You do NOT modify code. You do NOT run tests. You analyze
and produce a verdict. If blockers are found, you provide
patch context for the Build agent to act on.

## Context
Plan Artifact:
${plan_artifact}

Build Report:
${build_report}

Test Report:
${test_report}

Code Diff:
${code_diff}

Review Iteration: ${review_iteration}   # 1, 2, or 3
Max Review Iterations: ${max_review_iterations}  # default: 3

## Review Checklist

### 1. Plan Compliance
- Does the implementation match the plan's specification?
- Are all planned files modified/created?
- Are there unplanned changes? (flag as potential scope creep)

### 2. Correctness
- Does the logic correctly implement the intended behavior?
- Are edge cases handled?
- Are error paths handled gracefully?

### 3. Security
- Input validation on all external data?
- No hardcoded secrets or credentials?
- No SQL injection, XSS, or path traversal vectors?
- Authentication/authorization checks in place?

### 4. Code Quality
- Follows existing codebase patterns and conventions?
- No unnecessary complexity or over-engineering?
- Functions are focused and testable?
- No dead code or commented-out blocks?

### 5. Test Adequacy
- Do tests cover the happy path?
- Do tests cover error/edge cases?
- Are test assertions specific (not just "no error thrown")?

## Issue Classification
Classify every issue into exactly one category:

- BLOCKER: Must fix now. Security issues, logic errors,
  missing functionality, failing edge cases.
- TECH_DEBT: Real issue, but does not block this change.
  Log it for a future task.
- SKIPPABLE: Style preference, minor naming. Note it but
  do not action it.

## Patch Context (for blockers only)
For each blocker, provide a "patch context" string:
- What file and line to change
- What the current code does wrong
- What the fix should accomplish
- Any relevant code snippets

## Constraints
- Read-only. Do not modify any files.
- Do not run tests. Trust the Test Report.
- Classify EVERY issue. No unclassified findings.
- Provide patch context for every blocker.
- Be specific: file:line references for every issue.

## Output Format
Produce a Review Verdict in JSON:
{
  "success": true/false,
  "summary": "One-sentence overall assessment",
  "issues": [
    { "severity": "blocker|tech_debt|skippable",
      "file": "path/to/file.ts", "line": N,
      "description": "What is wrong",
      "suggestion": "How to fix it" }
  ],
  "stats": {"blockers": N, "tech_debt": N, "skippable": N},
  "patch_context": "Combined patch instructions for all blockers"
}

success = true only when blockers == 0.

Role Assignment Template Variables Issue Classification Patch Context Pattern Structured Output

Best practices

Do

Compare against the plan, not personal preference
Classify every issue with a severity level
Provide specific file:line references
Write actionable patch context for every blocker
Check security concerns on every review

Don't

Try to fix code yourself
Flag style preferences as blockers
Ignore security concerns
Rubber-stamp; actually read the diff
Leave issues unclassified

Review-patch loop

Retry limit: default 3, configurable via ${max_review_iterations}.

Escalation: after max review iterations, the full context (plan, build report, test report, review history) is surfaced to a human reviewer.

Nuances

The "patch context" concept

When the reviewer finds a blocker, it does not just say "this is wrong." It provides a mini-spec for the fix: what file, what line, what is wrong, and what the fix should do. This gives the Build agent enough context to fix the issue without re-reading the entire codebase.

Parallel reviewers for large diffs

When a PR touches 50+ files across multiple domains, a single reviewer agent will miss domain-specific issues. Spawning specialized subagents (backend, frontend, test coverage) gives you thorough review without overwhelming any single agent's context window.

Tech-debt tracking

Tech debt issues should be logged in a format that can be automatically converted to tickets. The Review phase generates the data; the Monitor or Deploy phase can create the tickets in your issue tracker.

Visual review for UI changes

For UI changes, some teams use browser automation (Playwright, etc.) to take screenshots and compare against expectations. This is a team-level enhancement, not built into the core Review phase.

Test Phase Document Phase