Appearance
debug
debug
standalonewriteseither
Use this when: you're stuck on a reproducible bug
Problem it solves — Piling patches on a bug you don't understand hides the cause. This roots it out in five disciplined phases — investigate, contrast working against broken, form a falsifiable hypothesis, fix behind a failing test, sweep for variants — and stops to question the architecture after three failed attempts.
Used in workflows: Debug & Learn
Debug
Forces a bug to prove its cause before you touch it; rigour scales to what the bug reveals — never an up-front guess (gnarliness is discovered, not predicted).
Two modes, one engine
- Interactive (
/debug #42) — the maintainer is present; the fix diff is surface-then-confirm as ever. - Autonomous (driven by
exterminate) — the per-diff confirm relocates to the merge gate; escalations become Status flips + structured "stuck" comments. The per-bug logic is identical — fix-one and fix-many share one engine.
Phase 0 — Reproduce-first (the router)
The first mandatory move for any bug: make it fail on demand — a failing Pest test where possible; for visual bugs a broken-state screenshot + the cheapest meaningful browser/UI-contract assertion.
- Can't reproduce within a bounded budget → stop:
Needs Info+ the specific missing fact. Never fix blind. - "Is this even a bug?" — ambiguous correct-behaviour (no spec, could be intended) is an escalation (
Ready for Human), never a guess. - Route by what the repro revealed +
codegraph_impacton the change site:- Fast lane — one obvious assertion and a narrow site (one file, few callers): fix directly behind that red test; phases 1–3 collapse; the proof gates still apply in full.
- Presentation lane — cosmetic/layout/copy: proof is a passing browser/UI-contract assertion + before/after screenshots (hard precondition). Ambiguous → behaviour (fail closed — "move the nav" can break a Wayfinder route).
- Deep lane — repro resisted, cause invisible, or wide blast radius: the five phases below. A fast-lane fix that isn't green on the first attempt falls through here — rigour is earned by resistance.
- Protected surfaces — the bug (or likely fix) touches outbound clients (Todoist/Google/push),
resources/views/prompts/**, model$hidden/$fillable/casts(),CredentialStore,app/Ai/Tools/**, auth/routes, or migrations → in autonomous mode always human-gated: investigate, then escalate with findings.
The five phases (deep lane)
1 — Investigate
Gather evidence before theorising. Read the actual error + stack, not your memory of it. Trace the real code path (codegraph callers/callees). Note what you know vs what you assume. Heavy read-only gathering can fork to an Explore subagent so the trace noise stays out of context.
2 — Contrast working vs broken
Find a case that works and diff it against the broken one — a passing test, a sibling path, the last green commit (git bisect against the phase-0 repro if cheap — the regressing diff sizes the fix and names the culprit). The bug lives in the delta; narrow it until it's small enough to reason about.
3 — Hypothesis (this is the skill)
State a single, falsifiable cause: "X is null because Y runs before Z." Design the cheapest experiment that would disprove it (a log line, a probe); run it; if it doesn't confirm, discard and form the next. Never fix on an unconfirmed guess — a cause you can't state in one sentence is still phase 1.
4 — Fix behind a failing test
Write the test that fails because of the bug first (red proves the cause). Make it green with the smallest change at the root, not the symptom; re-run the suite (php artisan test --compact).
5 — Variant sweep + guards
The patched bug is one instance of a class. Search the same pattern at every other call site before closing (codegraph_callers/codegraph_impact, not a grep loop) — a partial fix is as bad as no fix. Where the class has a cheap permanent guard (a PHPStan rule, arch test, lint), propose it — make the class unrepresentable. For security-class bugs, pull the class from docs/security/threat-model.md and scan every boundary that shares it.
Two companion guards before calling the fix done: an over-restrictive-patch check (the fix closes the exact hole without blocking adjacent legitimate behaviour — confirm the happy path still passes), and for security fixes a re-attack of the patched site (re-run the same vector against the fixed location — ~85 % of patches don't survive re-attack; mandatory, not optional; the sweep finds the class elsewhere, re-attack re-checks here).
Proof gates — every lane, no exceptions
- Red genuinely failed for the bug's reason — capture the pre-fix red run; a test that would have passed anyway proves nothing.
- The repro becomes the regression test — committed, pinning the bug shut.
- Mutation kills the mutant on the changed line(s) (
pest --mutatescoped to the diff) — the test exercises the fix, not just the line. - Symptom-gone, not just CI-green — re-run the original reported repro.
- PR body = RCA report-card — repro · root cause (one falsifiable sentence) · fix + why root-not-symptom · proof (red→green / screenshots) · confidence + riskiest assumption ·
Closes #NN.
Escalation ladder (either mode, either lane)
Stop and surface — interactive: ask; autonomous: Ready for Human + a structured stuck-comment (tried · diagnosis · candidate fixes · the one decision needed) — when any fires:
- 3 strikes — three failed fixes at one layer: the model is wrong; question the architecture, don't patch harder. Count test-authoring failures separately from fix failures — independent difficulties.
- A genuine decision-class fork — product/UX, data-model, ambiguous intended behaviour, anything security-relevant. Mechanical hardness never escalates; real choices always do.
- Can't reproduce (phase 0) →
Needs Info+ the specific missing fact. - Scope-creep tripwire — a fix reaching beyond the bug's blast radius (refactor, unrelated defect en route): stop, file the extra as its own issue, confirm before widening. The diff stays bug-sized.
- Protected surface (autonomous mode) — always.
After the fix
Investigation is free; the fix is a code change — interactive mode confirms the diff; autonomous mode's confirm is exterminate's merge gate (this engine never merges). Offer find-learnings when the root cause was non-obvious. Suspected flaky/infra failure? Re-run to confirm, file/link a tracking issue, quarantine with a linked TODO — never paper over flakiness.