Quality Gates - Agentic Engineering Framework

What are quality gates

A quality gate is a set of pass/fail criteria evaluated after a workflow phase completes. If the gate passes, the workflow moves to the next phase. If it fails, the gate triggers an action: retry the phase, patch and re-run, or escalate to a human. Gates are why autonomous workflows can be trusted.

Quality gates are the single biggest factor in your autonomy level. Weak gates keep you at Level 1-2 (human reviews everything). Strong, thorough gates let you reach Level 3-4 (autonomous / ASE). Invest in your gates.

The three concerns inside every gate

A quality gate looks like one thing, but it handles three distinct concerns. Understanding the layers matters because they change at different rates and for different reasons.

Classification gates

Classification gates operate on the structured output of a single phase. They answer: within this phase's results, what matters?

A review phase produces a list of issues. The classification gate says: blockers matter, tech-debt gets logged but does not block, skippable issues are noted without action. A test phase produces pass/fail results. The classification gate says: syntax errors are blockers, lint warnings are informational.

Classification gates are the innermost layer. They parse structured output and categorize findings by severity. They rarely change between autonomy levels; a blocker is a blocker whether a human is watching or not.

Healing gates

Healing gates wrap around phase execution. They answer: when something fails, should I try to fix it, and have I made progress?

A test healing gate says: resolve the failures, re-run, check if the failure count decreased. If no progress was made, stop retrying, because the same fix won't work twice. A review healing gate says: create patches for blocker-severity issues, re-implement, re-review.

Healing gates have configurable retry limits and termination conditions. They change slowly across autonomy levels. You might increase retries at higher levels where the cost of escalation is greater.

Disposition gates

Disposition gates are the outermost layer. They answer: after classification and healing are exhausted, does this phase's failure stop the workflow?

This is where autonomy levels live. At Level 3 (Autonomous), a test failure might be non-fatal (the workflow continues to review, and a human sees the failures in the PR). At Level 4 (ASE), the same test failure is fatal. The workflow aborts because there's no human to catch it downstream.

Disposition gates change dramatically between autonomy levels. They are the primary mechanism that differentiates one autonomy configuration from another.

How the layers compose

Classification feeds healing: only blocker-severity issues trigger the healing loop. Healing feeds disposition: if healing exhausts its retries and the phase is marked required, the workflow aborts. If the phase is marked optional, the workflow logs a warning and continues.

The same classification and healing configuration can produce different workflow behavior by changing only the disposition layer. This is why autonomy levels are gate configurations, not workflow configurations. You're changing what authority gate decisions have, not what they evaluate.

Anatomy of a quality gate

Trigger

which phase this gate runs after, e.g., after-test, after-review.

Criteria

what conditions must be met. A list of key-value assertions against phase output.

Action on fail

retry (loop), patch (fix and re-run), escalate (human).

Severity

blocker (must pass to proceed) or warning (log and continue).

gate: test-coverage
trigger: after-test
severity: blocker              # DISPOSITION: does failure stop the workflow?

# CLASSIFICATION: what counts as a failure?
criteria:
  all_tests_pass: true
  coverage_minimum: 80
  no_skipped_tests: true
  no_snapshot_regressions: true

# HEALING: what to do on failure?
on_fail: retry                 # retry | patch | escalate | log
max_retries: 3
retry_strategy: rebuild
stop_on_no_progress: true      # terminate early if failure count unchanged

# DISPOSITION: what happens when healing exhausts?
escalation:
  target: human
  context:
    - test_results
    - build_report
    - fix_attempts

Trigger Binding Criteria Assertions Retry Logic Escalation Path

Gate examples by phase

# gates/test-coverage.yaml
gate: test-coverage
version: 1
trigger: after-test
severity: blocker

criteria:
  all_tests_pass: true
  coverage_minimum: 80             # percentage
  no_skipped_tests: true
  max_test_duration_ms: 300000     # 5 minutes
  no_snapshot_regressions: true

on_fail: retry
max_retries: 3
retry_strategy: rebuild            # triggers build agent to fix failures

escalation:
  after_retries_exhausted: human
  message: "Tests failed after {retries} attempts. Manual review needed."
  context:
    - test_results
    - build_report
    - failure_history

# gates/review-severity.yaml
gate: review-severity
version: 1
trigger: after-review
severity: blocker

criteria:
  no_blockers: true                # zero blocker-severity issues
  tech_debt_max: 5                 # max 5 tech-debt items allowed
  security_issues: 0               # zero tolerance for security findings
  plan_adherence: true             # implementation matches the plan

on_fail: patch
max_retries: 3
retry_strategy: rebuild_and_retest  # fix issues, re-test, then re-review

escalation:
  after_retries_exhausted: human
  message: "Review found {blocker_count} blockers after {retries} patch cycles."
  context: [review_results, build_report, test_results, original_plan]

# gates/doc-completeness.yaml
gate: doc-completeness
version: 1
trigger: after-document
severity: blocker

criteria:
  has_changelog_entry: true
  all_public_apis_documented: true
  no_broken_links: true
  readme_updated_if_needed: true

on_fail: retry
max_retries: 2
retry_strategy: redocument

escalation:
  after_retries_exhausted: human
  message: "Documentation incomplete after {retries} attempts."
  context: [doc_artifact, build_report]

# gates/build-lint.yaml
gate: build-lint
version: 1
trigger: after-build
severity: warning                   # does NOT block workflow

criteria:
  lint_errors: 0
  lint_warnings_max: 10
  type_check_pass: true

on_fail: log                        # log warning, continue to Test phase
notify: true                        # include in final PR summary

# gates/deploy-readiness.yaml
gate: deploy-readiness
version: 1
trigger: after-deploy
severity: blocker

criteria:
  pr_created: true
  all_commits_signed: true
  branch_up_to_date: true
  no_merge_conflicts: true
  ci_pipeline_pass: true

on_fail: retry
max_retries: 2
retry_strategy: resolve_conflicts

escalation:
  after_retries_exhausted: human
  message: "Deploy preparation failed. Merge conflicts or CI failure."
  context: [deploy_report, ci_logs]

Writing custom gates

The criteria format

Criteria are key-value assertions. Keys reference fields in the phase's output artifact. Values are the expected state.

Type	Example	Meaning
Boolean	`all_tests_pass: true`	Field must be exactly true
Numeric min	`coverage_minimum: 80`	Field must be >= 80
Numeric max	`tech_debt_max: 5`	Field must be <= 5
Exact	`security_issues: 0`	Field must be exactly 0
String match	`verdict: "PASS"`	Field must match string
Exists	`has_changelog_entry: true`	Field must exist and be truthy

Combining criteria (AND/OR)

# Default: all criteria are AND (all must pass)
criteria:
  all_tests_pass: true
  coverage_minimum: 80

# Explicit OR: use any_of
criteria:
  any_of:
    - coverage_minimum: 80
    - coverage_minimum: 60
      has_integration_tests: true    # 60% OK if integration tests exist

# Nested: AND groups within OR
criteria:
  any_of:
    - all_of:
        - coverage_minimum: 80
        - no_skipped_tests: true
    - all_of:
        - coverage_minimum: 90       # higher coverage forgives skipped tests

Gate criteria reference the structured output produced by the phase. This is why the Output Format section in prompt templates matters: gates parse it.

Gate quality determines autonomy level

Classification and healing gates mature over time. You learn what severity thresholds work, what retry counts are effective, and what healing strategies succeed. This maturity is what earns the right to higher autonomy.

Disposition gates are what you configure when you change autonomy levels. Moving from Level 2 to Level 3 means changing test disposition from "require human approval on failure" to "abort or continue based on gate result." Moving from Level 3 to Level 4 means tightening dispositions: test failure becomes fatal (no human to catch it), documentation failure becomes non-fatal (shouldn't block auto-merge).

You don't advance to higher autonomy by trusting AI more. You advance by observing that your classification and healing gates are accurate, then granting disposition gates more authority over workflow flow.

See Autonomous Software Engineering for the full maturity path.

Gate anti-patterns

The Rubber Stamp

A gate with criteria so loose it never fails. coverage_minimum: 0, tech_debt_max: 999. This gate adds workflow overhead without adding safety.

Fix: set criteria based on your team's actual quality bar. Start conservative, relax only with data.

The Perfectionist

A gate that rarely passes. coverage_minimum: 100, lint_warnings_max: 0. The workflow exhausts retries and escalates every run. Your team stops trusting the system.

Fix: use severity: warning for aspirational criteria. Reserve severity: blocker for true blockers.

The Glass Cannon

A gate with max_retries: 0 or no retry strategy. A single flaky test stops the entire workflow. No self-healing, no recovery.

Fix: always include retry logic. Even 1 retry catches most transient failures. Use max_retries: 3 as a default.