Skip to content

franklywatson/agentic-patterns

Repository files navigation

Agentic Patterns

A toolkit of patterns for making codebases work effectively with AI coding agents — gathered from production use, organized by increasing capability, and designed to adopt incrementally.

Why This Exists

AI coding agents are becoming standard tools. But most codebases were built for humans — humans who can mentally bridge gaps, tolerate partial feedback, rely on intuition to navigate messy directories, and instinctively know which warnings to ignore.

AI agents can't do any of that. They need:

  • Complete feedback — a test either passes or fails, and the failure tells them exactly where to look
  • Structured context — information organized so complexity is discoverable, not dumped all at once
  • Enforced discipline — rules that are impossible to bypass, not just written in a wiki
  • Deterministic environments — no "it works on my machine," no shared state leaking between tests

None of these ideas are new. Deep modules, progressive disclosure, and evidence-based engineering are decades old — drawn from John Ousterhout's work on software design, Matt Pocock's graybox module concept, and well-established testing discipline. What's new is how critical they become when your coworker is an AI with no memory of your codebase and no intuition to fall back on.

These patterns stand on the shoulders of those established ideas. They reframe practices that excellent engineers already use — making them explicit, composable, and enforceable so that agents and humans alike benefit from the structure.

Where These Patterns Come From

The patterns were extracted from a production-grade Telegram trading bot — a financial application handling real blockchain transactions, built fully agentically with Claude Code as the primary development tool. That project developed Stack-First Development: bringing up the entire application stack in Docker and testing through API endpoints only, where each stack test is an atomic user journey that passes or fails as a whole.

Peter Steinberger, creator of OpenClaw, has described a similar approach — managing 5-10 parallel agents, closing the feedback loop so agents verify their own work, investing heavily in planning before implementation, and treating code reviews as architecture discussions. His core insight: "I don't think software engineering is dead with AI: in fact, quite the opposite." Agentic development demands more engineering discipline, not less.

Other frameworks approach similar problems from different angles. gstack provides a skill framework with resolver pipelines. superpowers provides base skills for brainstorming, planning, TDD, and code review. rig is a configurable guardrails framework built on the L2-L4 patterns from this repo — automating enforcement pipelines, tool routing, token-optimized scout agents, context eval, and session management. Install with rig init. These systems informed this work — the best available ideas at the time, applied in practice.

The lineage: patterns were extracted from the reference project, organized into this library, then used to build rig — to serve as a working reference implementation and configurable toolkit for general use.

Reference Implementations

System What it implements Language
rig Guardrails framework: L2 enforcement pipeline, L3 tool routing + token-optimized scout agent, L4 context eval, skill chain with phase transitions, CI guardrails. Configurable via .harness.yaml. TypeScript
gstack L2 skill framework with resolver pipeline, preamble system TypeScript
superpowers L2 base skills (brainstorming, TDD, verification, review), automated worktree management Markdown/JS
my-claw L1 design rinsing reference: autonomous multi-agent system whose architecture evolved through three phases of cross-domain design rinsing — YouTube demo to architecture, academic talk + codebases to agent design, agentic-patterns + compound engineering to development approach Python

The Pattern Pyramid

Patterns are organized into five levels, each building on the previous. The levels suggest an adoption order — earlier levels provide the foundation that later levels depend on — but teams should start where their gaps are and adopt what helps.

The Agentic Patterns Pyramid

Level Overviews

L0: Foundation — Structure your codebase so an AI with zero prior context can navigate, understand, and contribute. Deep modules, progressive disclosure, conceptual file organization, CLAUDE.md as project constitution, unit tests as contract, documentation as system map, and aggressive cleanup. The "can a new starter figure this out?" test.

L1: Closed Loop Design and Verification — The level where agents stop guessing and start designing. Context harvesting gathers targeted evidence before implementation. Design rinsing extends this beyond the current project — extracting distilled architectural understanding from external sources (codebases, transcripts, articles) and translating it into design decisions for the target project. Stack tests validate design intent end-to-end through the full application stack — real dependencies (no mocks in stack tests), no partial integration, no ambiguous results. Mocks are appropriate in unit tests for isolation. Full-loop assertion layering catches regressions at primary, secondary, and tertiary levels.

L2: Behavioral Guardrails — Rules written in prose are suggestions. Skills and hooks are enforcement. Overlay skills on top of base agent capabilities, chain them into a complete development lifecycle, and automate discipline through the tool layer.

L3: Optimization — Agent efficiency is quality, not just speed. Smart routing redirects shell commands to specialized tools (60-80% token reduction). Intent classification, environment-aware routing, and the Scout Pattern (from the WISC context engineering framework: Write, Isolate, Select, Compress) turn exploration into structured context.

L4: Standards & Measurement — Evidence-based claims, spec drift detection, the new starter audit, development metrics, CI guardrails, and context eval for routing effectiveness. The maturity layer that verifies L0-L3 are holding and measures their impact over time.

Getting Started

  1. New to agentic development? Start with L0: Foundation. The structural changes there are the highest-impact, lowest-effort starting point.
  2. Already using AI coding tools? Jump to L1: Closed Loop Design and Verification to understand how context harvesting and closed-loop verification improve agent outcomes.
  3. Building team practices? L2: Behavioral Guardrails and L4: Standards & Measurement together establish the discipline layer.
  4. Looking for adoption paths? See the Adoption Guide for suggested approaches — not a rigid plan, but a set of paths teams have found useful.

Audience

  • Solo developers and small teams using Claude Code, Cursor, or similar tools — adopt patterns incrementally starting at L0
  • Team leads and architects exploring agentic development practices for their organization
  • Anyone curious about what "agentic-friendly" software engineering looks like in practice

What's in This Repo

docs/               # Pattern documentation (L0-L4) and guides
examples/           # Working code examples (TypeScript + Python)
  stack-test/
    typescript/     # Jest-based stack tests + Playwright browser tests
                    #   Demonstrates: API-level AND browser-driven verification
                    #   Stack: Node.js + PostgreSQL + Redis in Docker
    python/         # pytest-based stack tests (API-level only)
                    #   Demonstrates: API-level verification, Python conventions
                    #   Stack: Python + PostgreSQL + Redis in Docker
  guardrails/       # L3 token optimization middleware (TypeScript)
                    #   Intent classification, smart routing, environment detection
  project-structure/ # Before/after directory layouts
docs/cross-cutting/ # Anti-patterns, adoption guide, FAQ, glossary
docs/references/    # Case studies (trading bot + my-claw) and further reading

The TypeScript and Python stack-test examples cover the same core pattern (Docker-based stack tests with full-loop assertions) but differ in scope: the TypeScript example extends to browser-based testing with Playwright, while the Python example demonstrates API-level testing with pytest conventions. Both are self-contained and runnable independently.

Contributing

This is a living pattern library. Contributions welcome:

  • New patterns that extend or challenge the existing framework
  • Real-world examples from different domains (the current examples lean toward ecommerce and trading)
  • Corrections when a pattern doesn't match your experience — document the exception
  • Translations of concepts to other frameworks and tools

Background and Further Reading

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors