Skip to main content
OverviewIntentPlanBuildTestReviewDocumentDeployMonitor
07
Operations

Monitor

Track production health. Close the loop. Feed issues back to Plan.

ReadSearchExecuteWrite

Why This Phase Exists

Without metrics, you cannot tell if your agentic layer is improving or degrading. Monitor tracks workflow health across workflows: success rates, iteration counts, cost, and drift.

It sits last because it observes the completed workflow run. It also closes the lifecycle loop: issues found in production or patterns detected in metrics feed back to Plan, starting a new workflow run.

The Monitor-to-Plan feedback arc is what turns the workflow from a one-shot tool into a continuous improvement system. It learns from its own history.

Methodology

1

Collect workflow metadata

duration, cost, turns, and tokens consumed across the entire workflow run.

2

Collect workflow metrics

Plan size (files targeted), diff stats (lines added/removed), patch iterations (Test and Review loops), success/failure status.

3

Calculate derived metrics

Success streak (consecutive successful workflows), average patch iterations (rolling 10), cost per workflow (rolling 10), failure rate (rolling 30 days).

4

Store health data

Append to the project's metrics file or database. Historical data enables trends and rolling averages.

5

Identify issues for feedback

If the workflow failed, required excessive patching, or exceeded cost thresholds, generate an issue description to feed back to the Plan phase.

6

Generate Production Health Report

Summary tables, per-phase breakdown, rolling metrics, and feedback issues (if any).

Inputs & Outputs

Inputs

  • All workflow artifacts
  • Workflow execution metadata (timing, cost)
  • Deploy report ($deploy_report)
  • Previous health history

Outputs

  • Production Health Report
  • Updated metrics store
  • Feedback issues (if thresholds exceeded)

Feedback loops back to: Plan phase (new workflow)

Production Health Report template
# Production Health Report -- Workflow ${workflow_id}

## Summary
| Metric | Value |
|--------|-------|
| Status | Success |
| Duration | 4m 32s |
| Total Turns | 23 |
| Cost | $0.47 |
| Plan Size | 4 files |
| Diff Stats | +187 / -42 lines |
| Test Iterations | 2 |
| Review Iterations | 1 |
| Success Streak | 7 |

## Per-Phase Breakdown
| Phase | Duration | Turns | Cost |
|-------|----------|-------|------|
| Plan | 45s | 4 | $0.08 |
| Build | 1m 52s | 8 | $0.18 |
| Test | 1m 05s | 6 | $0.12 |
| Review | 32s | 3 | $0.06 |
| Document | 12s | 1 | $0.02 |
| Deploy | 6s | 1 | $0.01 |

## Feedback Issues
- None (clean run)

Agent pattern

Single agent

Monitor is a data collection and reporting task. A single agent reads all workflow artifacts, calculates metrics, and generates the report. No multi-agent coordination needed.

Tool Permissions

ReadSearchExecuteWrite

Execute is needed to run metric collection scripts or API calls. Write (file editing) is denied because Monitor should not modify source code or workflow configuration. Its data output goes through Execute (appending to metrics store via scripts).

Prompt Template

# Monitor Phase -- Operations

## Role
You are an Operations agent. Your job is to collect metrics
from the completed workflow run, calculate production health metrics, and
identify any issues that should feed back to the Plan phase.

You do NOT modify code or workflow configuration. You
observe, measure, and report.

## Context
Workflow ID: ${workflow_id}
Workflow Start Time: ${workflow_start_time}
Workflow End Time: ${workflow_end_time}

Plan Artifact:
${plan_artifact}

Build Report:
${build_report}

Test Report:
${test_report}

Review Verdict:
${review_verdict}

Deploy Report:
${deploy_report}

Previous Health Data: ${kpi_history_path}   # path to metrics file

## Metrics to Collect

### Per-Workflow Metrics
- Status: success | failure | escalated
- Duration: end_time - start_time
- Total turns across all phases
- Estimated cost (based on token usage)
- Plan size: number of files targeted
- Diff stats: lines added, lines removed
- Test iterations: how many retry loops
- Review iterations: how many patch loops

### Derived Metrics
- Success streak: consecutive successful workflows
- Average test iterations (rolling 10 workflows)
- Average cost per workflow (rolling 10 workflows)
- Failure rate (rolling 30 days)

### Feedback Detection
If any of these conditions are true, generate a feedback
issue to feed back to Plan:
- Workflow failed after max retries
- Test iterations >= 3 (maxed out retry loop)
- Review iterations >= 3 (maxed out patch loop)
- Cost exceeded ${cost_threshold} (configurable)

## Methodology
1. Read all workflow artifacts to extract metrics
2. Calculate per-phase breakdowns
3. Read previous production health data for streak/rolling calculations
4. Identify feedback issues (if any)
5. Generate Production Health Report
6. Append data to metrics store

## Constraints
- Best-effort recording. If a metric is unavailable, skip
  it and note it as "N/A". Never fail the workflow because
  of a missing metric.
- Do not modify source code or workflow configuration.
- Feedback issues are suggestions, not commands.
- Non-fatal on failure. If Monitor itself errors, the
  workflow still succeeded.

## Output Format
Produce a Production Health Report with:
1. Summary Table -- Key metrics for this workflow
2. Per-Phase Breakdown -- Duration, turns, cost per phase
3. Rolling Metrics -- Streaks, averages (if history available)
4. Feedback Issues -- Issues to feed back to Plan (if any)
Role AssignmentTemplate VariablesMetrics CollectionFeedback DetectionBest-Effort Pattern

Best practices

Do

  • Collect every available metric
  • Calculate streaks and rolling averages
  • Generate feedback issues when thresholds are exceeded
  • Treat monitoring as non-fatal
  • Track cost per workflow for ROI analysis

Don't

  • Fail the workflow if monitoring fails
  • Modify code or configuration
  • Generate feedback issues for minor violations
  • Block on missing metrics; use N/A
  • Skip the per-phase breakdown

Nuances

Best-effort recording is critical

If the Monitor phase cannot access a particular metric (e.g., cost data is unavailable), it should skip that metric and note it as "N/A" rather than failing. The workflow succeeded; monitoring is observational.

Streak calculations motivate teams

Seeing "Success Streak: 15" builds confidence. A broken streak tells you something changed: a prompt template was modified, a dependency updated, or tasks got harder.

The Monitor-to-Plan feedback arc closes the lifecycle loop

The Monitor phase can feed issues back into the Plan phase, creating a continuous improvement cycle. This arc operates in two modes:

Human-mediated (default): Monitor produces a Production Health Report. A human reads it, identifies patterns (e.g., "the test retry loop fires on 60% of runs because our test suite is flaky"), and creates issues manually. The human provides judgment that automation cannot: distinguishing intentional changes from regressions, prioritizing improvements, and avoiding self-referential loops.

Automated: Monitor detects anomalies and auto-generates issues that feed into Plan. This is powerful but dangerous. Without safeguards, the workflow can create infinite loops of self-generated work. An agent detects a problem, creates an issue, the workflow processes the issue, creates new problems, the agent detects those problems, creates more issues.

Safeguards for automated Monitor-to-Plan

If you enable automated feedback, implement all of these:

  • Rate limiting: maximum N auto-generated issues per time window (e.g., 3 per day)
  • Scope bounding: auto-generated issues can only target the agentic layer (prompt templates, gate configs), never the application codebase
  • Human approval: auto-generated issues are created as drafts requiring human approval before workflow processes them
  • Loop detection: if an auto-generated issue triggers Monitor feedback that would generate another issue, halt and escalate
  • Cooldown period: after processing an auto-generated issue, wait before generating new ones to observe the effect

Most teams should start with human-mediated feedback and only automate after 50+ workflow runs have established baseline health patterns.

Cost tracking is essential

With API-based AI agents, cost-per-workflow is a metric you cannot ignore. You need to know the ROI of your agentic layer. Tracking cost per phase shows you where to optimize.

Non-fatal failure

Monitor is the only phase where failure does not affect the workflow outcome. If Deploy succeeded, the PR is ready regardless of whether metrics were collected. Monitor errors are logged, not escalated.

Lifecycle loop

Monitor Plan feedback issues new workflow

When Monitor detects problems (failed workflows, excessive retries, cost overruns), it generates structured feedback that feeds into Plan as a new task. The workflow becomes a continuous improvement loop.