Monitor
Track production health. Close the loop. Feed issues back to Plan.
Why This Phase Exists
Without metrics, you cannot tell if your agentic layer is improving or degrading. Monitor tracks workflow health across workflows: success rates, iteration counts, cost, and drift.
It sits last because it observes the completed workflow run. It also closes the lifecycle loop: issues found in production or patterns detected in metrics feed back to Plan, starting a new workflow run.
The Monitor-to-Plan feedback arc is what turns the workflow from a one-shot tool into a continuous improvement system. It learns from its own history.
Methodology
Collect workflow metadata
duration, cost, turns, and tokens consumed across the entire workflow run.
Collect workflow metrics
Plan size (files targeted), diff stats (lines added/removed), patch iterations (Test and Review loops), success/failure status.
Calculate derived metrics
Success streak (consecutive successful workflows), average patch iterations (rolling 10), cost per workflow (rolling 10), failure rate (rolling 30 days).
Store health data
Append to the project's metrics file or database. Historical data enables trends and rolling averages.
Identify issues for feedback
If the workflow failed, required excessive patching, or exceeded cost thresholds, generate an issue description to feed back to the Plan phase.
Generate Production Health Report
Summary tables, per-phase breakdown, rolling metrics, and feedback issues (if any).
Inputs & Outputs
Inputs
- • All workflow artifacts
- • Workflow execution metadata (timing, cost)
- • Deploy report ($deploy_report)
- • Previous health history
Outputs
- • Production Health Report
- • Updated metrics store
- • Feedback issues (if thresholds exceeded)
Feedback loops back to: Plan phase (new workflow)
# Production Health Report -- Workflow ${workflow_id}
## Summary
| Metric | Value |
|--------|-------|
| Status | Success |
| Duration | 4m 32s |
| Total Turns | 23 |
| Cost | $0.47 |
| Plan Size | 4 files |
| Diff Stats | +187 / -42 lines |
| Test Iterations | 2 |
| Review Iterations | 1 |
| Success Streak | 7 |
## Per-Phase Breakdown
| Phase | Duration | Turns | Cost |
|-------|----------|-------|------|
| Plan | 45s | 4 | $0.08 |
| Build | 1m 52s | 8 | $0.18 |
| Test | 1m 05s | 6 | $0.12 |
| Review | 32s | 3 | $0.06 |
| Document | 12s | 1 | $0.02 |
| Deploy | 6s | 1 | $0.01 |
## Feedback Issues
- None (clean run)Agent pattern
Monitor is a data collection and reporting task. A single agent reads all workflow artifacts, calculates metrics, and generates the report. No multi-agent coordination needed.
Tool Permissions
Execute is needed to run metric collection scripts or API calls. Write (file editing) is denied because Monitor should not modify source code or workflow configuration. Its data output goes through Execute (appending to metrics store via scripts).
Prompt Template
# Monitor Phase -- Operations
## Role
You are an Operations agent. Your job is to collect metrics
from the completed workflow run, calculate production health metrics, and
identify any issues that should feed back to the Plan phase.
You do NOT modify code or workflow configuration. You
observe, measure, and report.
## Context
Workflow ID: ${workflow_id}
Workflow Start Time: ${workflow_start_time}
Workflow End Time: ${workflow_end_time}
Plan Artifact:
${plan_artifact}
Build Report:
${build_report}
Test Report:
${test_report}
Review Verdict:
${review_verdict}
Deploy Report:
${deploy_report}
Previous Health Data: ${kpi_history_path} # path to metrics file
## Metrics to Collect
### Per-Workflow Metrics
- Status: success | failure | escalated
- Duration: end_time - start_time
- Total turns across all phases
- Estimated cost (based on token usage)
- Plan size: number of files targeted
- Diff stats: lines added, lines removed
- Test iterations: how many retry loops
- Review iterations: how many patch loops
### Derived Metrics
- Success streak: consecutive successful workflows
- Average test iterations (rolling 10 workflows)
- Average cost per workflow (rolling 10 workflows)
- Failure rate (rolling 30 days)
### Feedback Detection
If any of these conditions are true, generate a feedback
issue to feed back to Plan:
- Workflow failed after max retries
- Test iterations >= 3 (maxed out retry loop)
- Review iterations >= 3 (maxed out patch loop)
- Cost exceeded ${cost_threshold} (configurable)
## Methodology
1. Read all workflow artifacts to extract metrics
2. Calculate per-phase breakdowns
3. Read previous production health data for streak/rolling calculations
4. Identify feedback issues (if any)
5. Generate Production Health Report
6. Append data to metrics store
## Constraints
- Best-effort recording. If a metric is unavailable, skip
it and note it as "N/A". Never fail the workflow because
of a missing metric.
- Do not modify source code or workflow configuration.
- Feedback issues are suggestions, not commands.
- Non-fatal on failure. If Monitor itself errors, the
workflow still succeeded.
## Output Format
Produce a Production Health Report with:
1. Summary Table -- Key metrics for this workflow
2. Per-Phase Breakdown -- Duration, turns, cost per phase
3. Rolling Metrics -- Streaks, averages (if history available)
4. Feedback Issues -- Issues to feed back to Plan (if any)Best practices
Do
- Collect every available metric
- Calculate streaks and rolling averages
- Generate feedback issues when thresholds are exceeded
- Treat monitoring as non-fatal
- Track cost per workflow for ROI analysis
Don't
- Fail the workflow if monitoring fails
- Modify code or configuration
- Generate feedback issues for minor violations
- Block on missing metrics; use N/A
- Skip the per-phase breakdown
Nuances
Best-effort recording is critical
If the Monitor phase cannot access a particular metric (e.g., cost data is unavailable), it should skip that metric and note it as "N/A" rather than failing. The workflow succeeded; monitoring is observational.
Streak calculations motivate teams
Seeing "Success Streak: 15" builds confidence. A broken streak tells you something changed: a prompt template was modified, a dependency updated, or tasks got harder.
The Monitor-to-Plan feedback arc closes the lifecycle loop
The Monitor phase can feed issues back into the Plan phase, creating a continuous improvement cycle. This arc operates in two modes:
Human-mediated (default): Monitor produces a Production Health Report. A human reads it, identifies patterns (e.g., "the test retry loop fires on 60% of runs because our test suite is flaky"), and creates issues manually. The human provides judgment that automation cannot: distinguishing intentional changes from regressions, prioritizing improvements, and avoiding self-referential loops.
Automated: Monitor detects anomalies and auto-generates issues that feed into Plan. This is powerful but dangerous. Without safeguards, the workflow can create infinite loops of self-generated work. An agent detects a problem, creates an issue, the workflow processes the issue, creates new problems, the agent detects those problems, creates more issues.
Safeguards for automated Monitor-to-Plan
If you enable automated feedback, implement all of these:
- • Rate limiting: maximum N auto-generated issues per time window (e.g., 3 per day)
- • Scope bounding: auto-generated issues can only target the agentic layer (prompt templates, gate configs), never the application codebase
- • Human approval: auto-generated issues are created as drafts requiring human approval before workflow processes them
- • Loop detection: if an auto-generated issue triggers Monitor feedback that would generate another issue, halt and escalate
- • Cooldown period: after processing an auto-generated issue, wait before generating new ones to observe the effect
Most teams should start with human-mediated feedback and only automate after 50+ workflow runs have established baseline health patterns.
Cost tracking is essential
With API-based AI agents, cost-per-workflow is a metric you cannot ignore. You need to know the ROI of your agentic layer. Tracking cost per phase shows you where to optimize.
Non-fatal failure
Monitor is the only phase where failure does not affect the workflow outcome. If Deploy succeeded, the PR is ready regardless of whether metrics were collected. Monitor errors are logged, not escalated.
Lifecycle loop
When Monitor detects problems (failed workflows, excessive retries, cost overruns), it generates structured feedback that feeds into Plan as a new task. The workflow becomes a continuous improvement loop.