Adopt / Maturity Path

Maturity path

Start where you are. Set the level that matches your codebase needs and your team's comfort. Move up when your quality gates prove you're ready. Not every team needs Level 4, and that's fine.

Each level corresponds to a disposition gate profile, a configuration of which phase failures are fatal, optional, or human-reviewed. See Quality Gates for the gate taxonomy that makes this concrete.

Assisted

Supervised

Autonomous

ASE

Assisted

Duration: 1-2 weeks

How it works

You run each phase manually, review every output, and make all the decisions. Everyone starts here, on purpose. You're learning what your agentic layer can do and where it needs tuning.

What you're learning

Whether your prompt templates produce good, actionable output
Which constraints are missing from your templates (you'll discover these by what the agent gets wrong)
How your quality gates perform in practice. Are they catching real failures?
The rhythm of Plan → Build → Test and when to use self-healing loops

At Level 1, you are the Intent phase. You read the issue, decide what it really means, fill in the gaps, and paste a clear description into the Plan template. As you mature, the Intent phase can automate this refinement step.

A day at level 1

You pick an issue from your backlog. You open the Plan template, fill in the variables, and paste it into your agent. The agent researches and produces a plan. You read it. Looks good, references real files. You paste the Build template with the plan artifact. The agent writes tests and code. You run the tests. Two fail. You copy the test output, paste it back to the agent with the Build template. It fixes the implementation. Tests pass. You review the diff, commit, and push. That's one workflow run. Tomorrow, you'll do another.

Graduation checklist

Check these off as you complete them. When all are checked, you're ready for Level 2.

Plan templates consistently produce actionable specs with real file references Build output passes tests on first or second try (>70% of the time) Identified and fixed at least 3 template weaknesses Completed 10+ successful manual workflow runs Test-pass quality gate catches real failures (not just false positives)

Supervised

Duration: 2-4 weeks

How it works

The agent runs phases on its own, but you approve at checkpoints: after Plan, after Test, and after Review. Self-healing feedback loops run without your intervention. If tests fail, the agent retries on its own. You only step in at designated approval points.

Checkpoint flow

Plan → APPROVE → Build → Test ⟲ → APPROVE → Review ⟲ → APPROVE → Doc → Deploy

The ⟲ symbol indicates a self-healing loop. The agent retries automatically; you only see the final result at the approval checkpoint.

What you're learning

Whether the feedback loops can self-correct without your help
Whether quality gates are calibrated correctly (too many false positives? too lenient?)
Whether your approvals are confirming what the gates already caught, or surfacing new issues. If the gates consistently catch what you would catch, you may be ready for L3.

Graduation checklist

When these are consistently true, you can drop the checkpoints and move to L3.

Self-healing loops resolve 70%+ of failures without your help Quality gates have less than 20% false positive rate No surprises in the last 20 workflow runs You trust the Review phase output Checkpoint approvals are confirming what the gates already caught

Autonomous

Duration: 1-3 months

How it works

The agent runs the full workflow end-to-end. No checkpoints. You review only the final output: the pull request. Quality gates are your safety net. The agent plans, builds, tests, reviews, documents, and ships without waiting for approval at any step in between.

Most teams operate here. That's good engineering.

Level 3 is the right place for most engineering work. You get autonomous velocity while keeping human oversight on the final output. Don't rush to L4. Stay at L3 for anything that touches critical paths, new patterns, or areas where you don't trust your templates yet.

At Level 3, adding the optional Intent phase enables fully autonomous issue processing. Monitor-generated issues pass through Intent for refinement before reaching Plan, enabling fully autonomous issue processing from intake to PR creation.

What you're learning

Whether the full workflow consistently produces PR-ready output
Which types of work succeed at L3 and which still need supervision
How to configure the Monitor phase to catch regressions early
Your comfort with risk. What would need to be true for you to trust auto-merge?

Graduation checklist

These criteria are deliberately strict. L4 means zero human intervention, so you need high confidence.

95%+ workflow runs complete without any intervention PRs consistently meet quality standards on first submission Monitor phase configured and catching regressions Rollback procedures tested and documented Team has agreed on the risk profile for auto-merge candidates

L4: Autonomous Software Engineering

The destination, for the right work

How it works

Agent runs the workflow. Quality gates pass. Auto-merge. No human in the loop. You configure the agentic layer, and the workflow handles everything from issue to merged PR. This is Autonomous Software Engineering (ASE): humans out of the active development loop.

When L4 is appropriate

Low-risk changes: dependency updates, simple bug fixes, well-understood patterns
Repositories with excellent test coverage and monitoring
Issue types with proven track records (e.g., patches that have succeeded at L3 for months)
Services with fast rollback capabilities

When L4 is NOT appropriate

Security-sensitive code (authentication, authorization, encryption)
Breaking API changes or public interface modifications
New architectural patterns the workflow hasn't seen before
Regulatory or compliance-sensitive areas
Any work where a bad merge could cause significant business impact

The realistic picture: most teams operate at L2-L3 for critical paths and L4 for routine maintenance. That's not a failure. That's good engineering. L4 everywhere is a long way off. L4 for the right work, right now, is realistic and worth pursuing.

Measuring progress

Track these metrics to know where you are and when you're ready to graduate. Don't optimize for L4. Optimize for reliability at your current level.

Metric	L1	L2	L3	L4
Workflow success rate	Track manually	>70%	>90%	>95%
Avg loop count	N/A	<3	<2	<1.5
Escalation rate	100%	<30%	<10%	<5%
Human intervention	Every phase	Checkpoints	PR review	None

What to track

Success rate

87%

last 30 runs

Avg loops

1.8

retries per run

Escalation

12%

human needed

Time to PR

14m

avg duration

Example dashboard values for a team operating between L2 and L3. Your numbers will vary based on codebase complexity and template maturity.