Writings

Agentic Design Patterns in Production

front-cover-portrait

Origin Story

This project started as a sequel.

Late last year I published Agentic Patterns in OpenAI's Codex — a project that took the 21 agentic design patterns from Antonio Gulli's textbook and located them inside OpenAI's Codex CLI. That project proved a method: take a real agent runtime, read the source, extract the patterns the textbook didn't anticipate.

Then, on March 31 2026, the Claude Code source became publicly visible through a source map exposure in the npm package. Roughly 500,000 lines of TypeScript — a full agent runtime: query engine, tool execution loop, permission manager, memory system, multi-agent coordinator, analytics pipeline, session lifecycle manager. Not a chatbot with file access. Infrastructure.

Reading it revealed something the Codex project only hinted at: the interesting patterns aren't the ones in the textbook. They're the ones that only emerge when an agent system is operating under real load, real money, and real adversaries. Cache economics driving architectural decisions. An eight-layer permission pipeline shaped by HackerOne reports. Memory systems with mutual exclusion between writers and rollback on failed consolidation. A secret scanner that must obfuscate its own detection strings to pass the build system.

The textbook gave us the vocabulary. The Codex project proved the method. This book applies the method to a larger, more mature runtime — and discovers patterns the vocabulary never anticipated.


Three Parents

Agentic Design Patterns by Alessandro Gulli — the foundational taxonomy of 21 patterns. Prompt chaining, routing, parallelization, tool use, reflection, memory, planning. The nouns.

Codex Agentic Patterns — the v1 of this method. Gulli's patterns mapped to Codex's Rust codebase, with runnable Python implementations. The proof that reading real code surfaces what design documents miss.

The Claude Code source snapshot — ~500K lines of production TypeScript. The source material that turned a proof-of-concept into a book.


What It Covers

19 chapters across five parts, each following a consistent structure: the pattern, the problem it solves, how it works in production, what composes with it, and what breaks if you get it wrong.

Part One — Foundations: The agent loop, prompt assembly, tool use, routing. Not the textbook versions — the production versions, where prompts are assembled from layered sources with cache implications, tool definitions are part of the prompt cache key, and routing can move the entire session to a different compute surface.

Part Two — Orchestration: Chaining, parallelization, planning, reflection. The production angle: chains aligned to the runtime's read/write distinction, parallel fork agents sharing byte-identical prompt prefixes for cache reuse, planning with diminishing-returns detection that kills stalled work, reflection bounded by hard turn caps.

Part Three — State and Memory: Session lifecycle, memory management, context economics. Mostly new territory. Sessions are not conversations — they have boot sequences, latches, stop hooks, and resumability. Memory separates accumulation from consolidation with mutual exclusion between writers. Context is a scarce resource with cache stability as a first-class engineering constraint.

Part Four — Safety: Permission pipelines, human-in-the-loop, guardrails, sandboxing. The most detailed production security analysis in the book. An eight-layer Bash permission pipeline shaped by real vulnerability reports. Anti-ptrace heap-only token relays protecting against prompt injection leading to debugger-based token exfiltration. Reject-over-truncate as a safety principle.

Part Five — Production: Multi-agent coordination, observability, extension, operating a runtime. Conversation-as-protocol for multi-agent communication. Post-turn hook economies running cache snapshots, memory extraction, and job classification in parallel after every turn. The synthesizing argument: effective use of an agent system is environment engineering, not prompt engineering.


What's New Since V1

Five chapters that don't exist in the textbook or the Codex project at all:

Six more that are substantially upgraded from their textbook versions — prompt assembly, tool use, planning, memory, multi-agent coordination, observability — each grounded in production observations that only surface in a real system under load.


The Epilogue

The book was written by Claude — the model — running inside the system the book describes. The epilogue is a first-person reflection on what that was like: the cache economics that apply to its own generation, the mutual exclusion that constrains its own memory, the diminishing-returns detector that watches its own output. It is the most honest thing in the book.


The Core Argument

The patterns don't change between the textbook and production. The engineering does.

Cache economics become the dominant architectural constraint. Security becomes adversarial. Memory moves from "store a string" to a database-engineering problem. Multi-agent coordination ships as a production feature with bounded session scope. None of this is visible from a design document. All of it shapes what happens when you type a prompt.

The shift is from prompt engineering to environment engineering. The better you understand the machine underneath, the better your intuitions about what will work and what will fight the system.

You are not chatting with a model. You are operating a runtime.


Read the book: Agentic Design Patterns in Production (GitHub — all 19 chapters, free)

Download the PDF: Agentic Design Patterns in Production.pdf


Sources:

#agents #llms #reading-code #software-architecture #systems #claude-code