Design Principles

1

Opinionated but evolvable

Strong defaults for every phase. Everything configurable when you need it.

Why it matters

AI tooling changes weekly. New models and new tools. A framework that hardcodes opinions becomes tech debt within months. But a framework with no opinions leaves developers re-making the same decisions each time they start a project.

AEF ships opinionated: TDD-first builds, structured review loops, PR-based shipping, 3-retry feedback loops. These are the defaults because they work for most teams. But every single opinion is a configuration point in workflow.yaml or in the prompt templates.

The default test retry limit is 3. A team working on a payments system might set it to 5. A team prototyping might set it to 1. The framework does not care; it ships a default, and you override when needed.

Anti-pattern

Hardcoding a test framework ("always use Jest"). Requiring a specific CI/CD system. Any opinion that cannot be overridden is a wall, not a guardrail.

How to implement

# workflow.yaml
loops:
  test_retry:
    max: 3               # Override: set 1-10
gates:
  test:
    coverage_min: 80     # Override: set 0-100
    framework: auto      # Override: jest | pytest | go test | ...

2

Self-healing over failure

Workflows diagnose and retry before escalating to a human.

Why it matters

A workflow that stops on every routine failure still requires human intervention for issues the system could resolve on its own. The whole point of autonomy is handling routine failures without human involvement.

A test failure is not a crisis. It is a signal. The workflow spawns a builder agent, gives it the failure context, and lets it attempt a fix. Only after exhausting retries does a human get involved.

A build produces code that fails 2 of 15 unit tests. Instead of stopping, the workflow: (1) analyzes the failure output, (2) spawns a builder agent with the failure context and the original plan, (3) the builder patches the code, (4) tests re-run. If all pass, the workflow continues. If not, it retries up to 2 more times. Only then does it escalate.

Anti-pattern

A workflow that fails and immediately pings a human on Slack. A workflow with no retry logic. A workflow that retries forever without limits, burning tokens in an endless loop.

How to implement

loops:
  test_retry:
    max: 3
    context_includes: [error_output, original_plan, changed_files]
  review_patch:
    max: 3
    context_includes: [review_issues, test_results, changed_files]
escalation:
  notify: [slack, email]
  include: [full_workflow_state]

3

Plan-before-execute

Research and plan are read-only. Never plan and build simultaneously.

Why it matters

When agents plan and execute at the same time, they commit to an approach before understanding the problem. You get scope creep, wrong abstractions, and wasted tokens on code that gets thrown away.

The Plan phase is read-only by design. The planner agent can read every file, search the codebase, analyze dependencies, but it cannot modify anything. This forces understanding before code gets written.

A feature request asks for "user authentication." Without plan-first, an agent might immediately start writing a JWT module. With plan-first, the planner discovers the codebase already uses session-based auth, finds the files to modify, and writes a plan that extends the existing pattern instead of introducing a conflicting one.

Anti-pattern

An agent that begins writing code before completing a structured analysis. An agent that creates placeholder files instead of producing a complete plan. Any workflow where the first file modification happens before a structured plan exists.

How to implement

Plan phase prompt template explicitly states: "Read-only. Do not modify any files." Plan phase tool permissions enforce it:

Read Search Execute Write

4

Least privilege

Each phase gets only the tools it needs.

Why it matters

Giving an agent tools it does not need is both a security risk and a quality risk. A planner with write access might "helpfully" start modifying files. A reviewer with execute access might run tests instead of analyzing code.

Constraining tools per phase enforces separation of concerns. The Review phase has Read and Search permissions only. It cannot modify code, run tests, or deploy. It can only analyze. This means review findings are analysis, not influenced by "let me try fixing it."

Phase	Read	Search	Execute	Write
Plan	Allowed	Allowed	Denied	Denied
Build	Allowed	Allowed	Allowed	Allowed
Test	Allowed	Allowed	Allowed	Allowed
Review	Allowed	Allowed	Denied	Denied
Document	Allowed	Allowed	Denied	Allowed
Deploy	Allowed	Allowed	Allowed	Denied
Monitor	Allowed	Allowed	Allowed	Denied

Anti-pattern

Giving every phase full access to all tools. Agents tend to use every tool available to them. More tools means more surface area for unintended actions.

How to implement

Define a permission matrix in tools/*.yaml. Map each phase to its allowed capabilities. Enforce at the workflow engine level so the agent never sees tools it should not use.

Enforcement can be runtime-level (agent cannot access the tool) or prompt-level (agent is instructed not to use the tool). Runtime enforcement is stronger; prompt enforcement is more portable. Use both when your agent runtime supports it.

5

Test-driven contracts

Tests are written first. They are the contract every subsequent phase validates against.

Why it matters

Without tests-first, the Review phase has no objective baseline. "Does this code work?" becomes a judgment call. With tests-first, every phase has a concrete answer: the code must pass these tests.

Tests written before implementation also prevent scope creep. The builder agent knows what "done" means: all tests pass. It cannot over-build or under-build.

The Build phase receives a plan for a new /users endpoint. Step 1: write tests for GET /users, POST /users, validation errors, auth requirements. Step 2: write code until all tests pass. The test suite is the contract.

Anti-pattern

Writing code first, then writing tests to match the code. In an agentic context, this means the tests confirm what was already built rather than defining the success criteria up front. Also: skipping tests entirely "because the agent got it right."

How to implement

## Build Order
1. Write test files first (unit + integration)
2. Run tests -- confirm they FAIL (red)
3. Write implementation
4. Run tests -- confirm they PASS (green)
5. Never skip step 1-2

6

Structured outputs

Machine-readable for agents, human-readable for people.

Why it matters

Workflow phases consume each other's outputs. If the Plan phase outputs freeform prose, the Build phase has to interpret it. Interpretation breeds inconsistency.

Structured outputs (JSON, YAML, markdown with fixed headings) can be both machine-parsed and human-read. Template variables like $plan_artifact only work when the artifact has a predictable shape.

The Plan phase outputs a structured spec with consistent sections: Summary, Files to Modify, Files to Create, What We Are NOT Doing, Test Strategy. The Build phase knows exactly where to find file targets. A human reviewing the plan finds the same sections every time.

Anti-pattern

A plan that is just a paragraph of text. Review findings as a chat message. Test results as raw console output without structured parsing. Any output that requires a human to interpret before the next phase can use it.

How to implement

Define output format contracts in each prompt template under ## Output Format. Use markdown with required headings for human-readable structure. Use JSON or YAML blocks within the markdown for machine-consumed data. See Artifact Chaining for how this connects to variable substitution.

7

Deploy means PR

"Done" is a reviewable change request, not a deploy.

Why it matters

Deploying autonomously to production is a different risk class than creating a PR. AEF's Deploy phase produces a pull request or change request containing all artifacts: code, test evidence, review summary, documentation.

This gives humans an audit trail and a merge decision point. Teams that trust their quality gates can auto-merge. Teams that want oversight review the PR. The framework supports both without changing the workflow.

Anti-pattern

A workflow that deploys directly to production without a reviewable artifact. A workflow that commits directly to main. Any workflow where there is no audit trail of what was changed and why.

How to implement

# PR body structure
## Summary
${doc_artifact.summary}
## Changes
${build_artifact.changed_files}
## Test Evidence
${test_results.summary}
## Review Summary
${review_artifact.verdict}

8

Tool agnostic

Patterns transfer to any agent stack. The framework teaches structure, not a vendor. AEF is designed for code-editing agents with file system and shell access. Within that class, patterns are portable across providers.

Why it matters

AI tooling changes monthly. Tying a framework to a specific agent provider, model, or tool means rewriting everything when that provider changes or a better option appears.

AEF defines patterns (phases, loops, gates, artifact chaining) that work whether you use Claude, GPT, Gemini, open-source models, or a mix, as long as the agent can read files, write files, and execute shell commands. The agentic layer is the abstraction boundary. Switch providers tomorrow; your prompt templates, quality gates, and workflow config stay the same. Only the tool integration layer changes.

Anti-pattern

A framework that only works with one agent provider. Prompt templates that use provider-specific syntax. Claiming compatibility with agents that lack file system access or shell execution; AEF's patterns assume these capabilities.

How to implement

# tools/file-system.yaml
category: file_system
capabilities:
  - read_file
  - write_file
  - list_directory
  - search_content
# Implementation maps to your
# specific agent's tools

Summary

#	Principle	One-Liner
1	Opinionated but Evolvable	Strong defaults, everything configurable
2	Self-Healing Over Failure	Diagnose and retry before escalating
3	Plan-Before-Execute	Research is read-only, plan before build
4	Least Privilege	Each phase gets only needed tools
5	Test-Driven Contracts	Tests first, they are the contract
6	Structured Outputs	Machine-readable and human-readable
7	Deploy Means PR	Done = reviewable change request
8	Tool Agnostic	Patterns transfer to any agent stack

Opinionated but evolvable

Why it matters

Anti-pattern

How to implement

Self-healing over failure

Why it matters

Anti-pattern

How to implement

Plan-before-execute

Why it matters

Anti-pattern

How to implement

Least privilege

Why it matters

Anti-pattern

How to implement

Test-driven contracts

Why it matters

Anti-pattern

How to implement

Structured outputs

Why it matters

Anti-pattern

How to implement

Deploy means PR

Why it matters

Anti-pattern

How to implement

Tool agnostic

Why it matters

Anti-pattern

How to implement

Summary

Go deeper

Framework Overview

Architecture

Get Started