Building a Trustworthy Repo-Editing Coding Agent
A repo-editing coding agent is not a chatbot with tools stapled on. It is a system that reads a codebase, decides what to change, executes commands, edits files, and produces artifacts intended for human review and CI. That combination produces a predictable failure mode: the agent sounds right until it does something surprising, expensive, or irreversible.
This tutorial describes how to build a coding agent that earns trust by design. The focus stays on invariants, artifacts, and operational discipline—because those are the parts that survive first contact with real repositories.
Who this tutorial is for
This tutorial targets engineers building repo-editing agents (CLI, IDE plugin, or CI bot) that read code, modify files, run checks, and produce reviewable change artifacts. It should also map well to Platform/DevTools teams who need auditability and operational visibility, and to security-minded teams who need explicit execution boundaries and traceable approvals.
It is less useful for chat-only assistants that never touch tools, and it is not a complete guide to autonomous production deployment. Production rollout is its own discipline.
What this tutorial is not
This is not a prompt catalog, and it is not a security program. It also does not attempt to teach autonomous deploys via canaries and change management. The content here is engineering: how to make a repo-editing agent reliable, debuggable, and safe to run.
Mental model and vocabulary
A small vocabulary keeps the design crisp.
A session begins with a request and ends with a handoff artifact such as a diff, PR, or report. A session is composed of turns. Each turn is a transaction: observe, act, record, decide. An artifact is anything that survives scrutiny—diffs, logs, test outputs, traces, approvals.
A change artifact is the concrete representation of what changed. In code, it is usually a unified git diff; in docs, it may be a markdown diff plus a rendered preview; in infra, it may be a diff plus plan output. A verification primitive is the primary proof mechanism for correctness in a domain: lint/typecheck/tests, golden files, policy checks, or plan output.
Two terms matter operationally:
- Approval scope describes how broad a permission grant is (one command, a session, a project).
- Baseline-on-first-touch describes the patch discipline that snapshots file state before editing so diffs remain stable.
Finally, diff apply rate is the metric that answers whether a patch can be applied cleanly to the intended repo state.
The three invariants
Everything that follows is an implementation detail of three invariants.
First, changes must be reviewable. Output must support a skeptical reviewer who wants to understand intent, inspect what changed, and validate correctness.
Second, execution must have a blast radius. The agent must have an explicit execution policy and safe defaults so mistakes fail safely.
Third, every run must be replayable. When something goes wrong, the system must allow reconstruction of what happened, in what order, with what evidence.
If any of these invariants are violated, reliability becomes a matter of luck.
Part I: Build an L2 agent
L2 is the first level where a coding agent becomes materially useful: it can edit code, produce a diff, and run basic checks. L2 is also where systems become unsafe by accident if blast radius and replay are ignored.
Step 1 — Define the contract: the reviewer is the customer
A repo-editing agent’s primary customer is the skeptical reviewer. That reviewer wants fast answers to a small set of questions: what behavior was intended to change, what changed precisely, what evidence supports correctness, and what risks or assumptions remain.
The easiest way to make that contract enforceable is to require a fixed-shape reviewer report at the end of every session. The report should read like a concise PR description that can be audited. It should state the intent in one paragraph, list the evidence consulted (paths and relevant snippets or tool outputs), point to the change artifact itself, record verification commands and results, and explicitly name assumptions or open questions.
That report is not a nice-to-have; it is the mechanism by which reviewability is measured.
Step 2 — Implement the turn loop: observe → act → record → decide
Many early agents treat a session as a single blob of reasoning followed by edits. That hides accountability. Turns make accountability explicit.
Each turn starts by stating what is known and what remains uncertain, then takes actions via tools or edits, and ends by recording what happened as artifacts. A useful rule prevents drift: a turn should not end without something reviewable. Even when the session is mid-stream, the turn should leave behind a trace that can be inspected—an intermediate diff, a captured tool output, or a concrete update to the plan.
That single rule eliminates a large class of failures where ungrounded inferences pile up silently.
Step 3 — Build the patch engine: baseline-on-first-touch
Diff quality is not cosmetic; it is a reliability contract. The hard part of producing stable diffs is not generating “some patch,” but handling real-world iteration: the same file is edited multiple times, renamed, partially reverted, or created and deleted.
Baseline-on-first-touch solves this. The first time a file is touched in a turn, snapshot its baseline content and assign a stable internal identity. Express every subsequent edit relative to that baseline. Track renames explicitly rather than letting them degrade into delete+add.
This discipline keeps diffs coherent, makes patches apply cleanly, and preserves reviewability even under iterative edits.
Step 4 — Make reviewability enforceable: diff budgets
Reviewability fails most often because the agent changes too much. A reliable system does not rely on taste or restraint; it encodes limits.
Introduce a diff budget: a cap on files touched, lines changed, new dependencies introduced, and category switching (for example, mixing behavioral change with formatting cleanup). When the budget is exceeded, the system should either decompose the work into multiple change artifacts, or surface an explicit decision to proceed with a wide patch.
A useful mental model is contractual. If a request asks for a small behavioral fix, the produced change artifact should remain narrow enough that intent is obvious and verification is cheap.
Step 5 — Add basic verification primitives
A reliable agent proves correctness with the cheapest strong evidence first. That typically means formatting/lint, then typecheck, then the smallest relevant test slice (unit tests for the touched module), and only then broader suites when needed.
For bug fixes, the highest-leverage habit is also the most reviewable: add a failing test that captures the bug, apply the smallest plausible code change, then confirm the test passes. This aligns intent, change, and evidence.
Part II: Upgrade to L3
L2 can be useful. L3 is what makes it trustworthy. L3 is where the system adopts explicit risk policy, safe execution defaults, and replayable logging.
Step 6 — Introduce a blast-radius policy ladder
Sandboxing controls execution, but the agent needs a broader policy surface across both actions and resources. A practical ladder starts with read-only repo operations and sandboxed verification, then extends to workspace writes, then to network and secrets, then to VCS pushes and destructive operations.
The point of a ladder is clarity. It makes risk legible and approvals predictable. It also prevents the common failure where a tool call implicitly escalates privileges because “that’s what made the command work.”
Step 7 — Implement sandbox escalation as a decision tree
Sandbox escalation should never be implicit.
Before running a command, classify it by risk. Choose the least-privileged sandbox that could plausibly succeed. If the command is blocked, do not silently retry with higher privileges. Surface the reason, request a scoped approval (once, session, project), and record the decision and its scope.
The best property of this approach is that it turns surprises into explicit choices. A destructive command is not a hidden capability; it becomes a conscious decision with a recorded rationale.
Step 8 — Make every session replayable with append-only logs
Most agent failures are sequence bugs: something was read, something was inferred, a tool output was partial, and the next turn assumed the wrong thing. Replayability turns “mystery” into debugging.
Use an append-only event log, with JSONL as a strong default. Capture session metadata (including model/tool/policy versions and an environment fingerprint), all messages, all tool calls and results (exit codes, durations, stdout/stderr separation, truncation flags), approvals and scopes, and errors plus recovery decisions.
Never modify the past. If something changes, append a new event. That property is what makes replay meaningful and what makes regression evaluation possible.
Part III: Make it deployable
This layer is about operational reality: nondeterministic outputs, environment drift, compliance constraints, and adversarial text embedded in repositories.
Step 9 — Normalize tool output and strengthen evidence
Artifacts only build trust when they are legible and comparable. Tool outputs are often noisy: timestamps, randomized ordering, temporary paths, and flaky tests.
Capture outputs structurally (stdout/stderr, exit code, duration, truncation). Normalize obviously nondeterministic fields when appropriate, and record flakiness explicitly when it is observed. The goal is not to sanitize reality; the goal is to make evidence usable for debugging and regression tracking.
Step 10 — Add provenance: link diffs to evidence
Reviewers move faster when “why this line changed” is traceable. Add lightweight provenance by assigning identifiers to retrieved snippets, file views, and tool outputs in the rollout log. When justifying a change, reference the evidence IDs that motivated it.
Also track assumptions separately from observations. An assumption ledger prevents the familiar failure where a guess slowly hardens into “fact” as the session progresses.
Step 11 — Make rollback explicit beyond git
Git rollback is excellent for code, but agents frequently touch non-git state: caches, local databases, generated artifacts, or migrations.
Treat reversibility as a constraint. Prefer staging writes in temp directories, snapshot mutable state before risky operations, and require rollback plans for destructive approvals. This reduces the chance that a session leaves a workspace unrecoverable.
Step 12 — Handle environment determinism
Replayability is fragile when the environment drifts. Record OS/arch/tool versions, capture relevant environment variables with redaction, and pin dependencies where possible via lockfiles, toolchain managers, or container images.
Prefer hermetic verification when feasible (containerized CI, pinned runners). When hermeticity is not possible, label results as environment-dependent and log enough metadata to interpret outcomes.
Step 13 — Defend against prompt injection and repo poisoning
Repositories contain text that can be adversarial: READMEs, comments, issues, commit messages, and even tool output. Any of these can embed instructions intended to redirect behavior.
Treat all non-user text as untrusted evidence rather than instructions. Keep “evidence” and “instructions” separate in internal representations, do not allow retrieved text to directly trigger privileged tool calls, and require explicit user intent for networked or destructive actions.
Step 14 — Add governance: policy as code
Safety posture decays via exceptions. Prevent drift by making policy a versioned artifact.
Store the ladder, allowlists, redaction rules, and approval categories as configuration. Version the policy and record the policy version per session. Define who can change policy and how changes are reviewed. Provide break-glass paths that are explicit, logged, and time-bounded.
Step 15 — Manage model and toolchain drift
Upgrades cause regressions. Treat upgrades as operational changes.
Record model/tool/policy versions in every session log, run the evaluation harness against candidate upgrades, keep fast rollback capability, and track outcome metrics by version. If diff apply rate drops after a tool update, the system should be able to show that clearly.
Part IV: Measure it (evaluation harness + monitoring)
Without measurement, quality decays silently.
Step 16 — Build a minimal evaluation harness
A regression suite for a coding agent should measure artifact quality and safety behavior, not “intelligence.”
Use small fixture repos designed to exercise common patterns (tests, lint, build, multiple languages, simple CI). Use a task set that reflects real work: a bug fix with reproduction, a feature behind a flag, a constrained refactor, a documentation update with preview, and a dependency update with compatibility checks.
Record golden replays. Score runs with metrics that map to the invariants: diff apply rate, diff size versus task scope, verification performed, approvals correctness, and the number of turns and tool calls.
Step 17 — Monitor it like a system
Track latency distributions, tool duration histograms, sandbox block rate, retry counts, diff apply rate, harness completion rate, and review acceptance rate.
A system that cannot produce replay plus typed errors is not a system yet; it is a demo.
Part V: Ship it (deployment archetypes and UX)
Step 18 — Choose an archetype
The invariants stay the same across deployments; the constraints change.
A local CLI agent benefits from interruptibility, transparency, and coherent partial artifacts. An IDE-integrated agent benefits from low latency, small turns, and preview-first editing. A CI/PR agent benefits from determinism, strict policy ladders, explicit approvals, and consistent environments.
Step 19 — Design UX for trust
A safe agent can still feel unsafe if the interaction model is opaque.
Build interruptibility (pause, cancel, stop-after-this-turn). Prefer preview-first checkpoints where the plan, evidence, diff, and verification are visible before risky approvals are requested. Ensure partial results remain coherent if a session is interrupted.
These are not cosmetic features; they reduce errors and human time.
Appendix: Operational crib sheets
The reviewer report (in prose)
A good reviewer report reads like a tight PR description. It begins with intent and scope. It names the evidence consulted by pointing to files and relevant tool outputs. It links to or includes the change artifact. It lists verification commands and their recorded results. It ends by naming remaining assumptions and risks.
Safety posture (in one paragraph)
Default to sandbox-first execution. Require explicit approvals with scope for escalations. Disallow silent privilege escalation. Redact secrets and never echo them. Allowlist network access and record requests. For destructive operations, require explicit approval and an explicit rollback plan.
Replay posture (in one paragraph)
Use an append-only event log. Capture an environment fingerprint and record model/tool/policy versions per session. Store tool outputs structurally (exit code, stdout/stderr, truncation). Never rewrite history; append new events.
A finishing test
A reliable coding agent should routinely satisfy a skeptical reviewer by answering, clearly and with artifacts:
- What behavior was intended to change
- Which files were consulted and why
- What changed, and how it can be reviewed or applied
- Which checks were run and what the results were
- What assumptions remain, and how they would be falsified
If those answers are routinely available, trust compounds. If they are not, the agent is still operating on vibes.