debug

debug

standalonewriteseither

Use this when: you're stuck on a reproducible bug

Problem it solves — Piling patches on a bug you don't understand hides the cause. This roots it out in five disciplined phases — investigate, contrast working against broken, form a falsifiable hypothesis, fix behind a failing test, sweep for variants — and stops to question the architecture after three failed attempts.

Used in workflows: Debug & Learn

Debug

Forces a bug to prove its cause before you touch it; rigour scales to what the bug reveals — never an up-front guess (gnarliness is discovered, not predicted).

Two modes, one engine

Interactive (/debug #42) — the maintainer is present; the fix diff is surface-then-confirm as ever.
Autonomous (driven by exterminate) — the per-diff confirm relocates to the merge gate; escalations become Status flips + structured "stuck" comments. The per-bug logic is identical — fix-one and fix-many share one engine.

Phase 0 — Reproduce-first (the router)

The first mandatory move for any bug: make it fail on demand — a failing Pest test where possible; for visual bugs a broken-state screenshot + the cheapest meaningful browser/UI-contract assertion.

Can't reproduce within a bounded budget → stop: Needs Info + the specific missing fact. Never fix blind.
"Is this even a bug?" — ambiguous correct-behaviour (no spec, could be intended) is an escalation (Ready for Human), never a guess.
Route by what the repro revealed + codegraph_impact on the change site:
- Fast lane — one obvious assertion and a narrow site (one file, few callers): fix directly behind that red test; phases 1–3 collapse; the proof gates still apply in full.
- Presentation lane — cosmetic/layout/copy: proof is a passing browser/UI-contract assertion + before/after screenshots (hard precondition). Ambiguous → behaviour (fail closed — "move the nav" can break a Wayfinder route).
- Deep lane — repro resisted, cause invisible, or wide blast radius: the five phases below. A fast-lane fix that isn't green on the first attempt falls through here — rigour is earned by resistance.
Protected surfaces — the bug (or likely fix) touches outbound clients (Todoist/Google/push), resources/views/prompts/**, model $hidden/$fillable/casts(), CredentialStore, app/Ai/Tools/**, auth/routes, or migrations → in autonomous mode always human-gated: investigate, then escalate with findings.

The five phases (deep lane)

1 — Investigate

Gather evidence before theorising. Read the actual error + stack, not your memory of it. Trace the real code path (codegraph callers/callees). Note what you know vs what you assume. Heavy read-only gathering can fork to an Explore subagent so the trace noise stays out of context.

2 — Contrast working vs broken

Find a case that works and diff it against the broken one — a passing test, a sibling path, the last green commit (git bisect against the phase-0 repro if cheap — the regressing diff sizes the fix and names the culprit). The bug lives in the delta; narrow it until it's small enough to reason about.

3 — Hypothesis (this is the skill)

State a single, falsifiable cause: "X is null because Y runs before Z." Design the cheapest experiment that would disprove it (a log line, a probe); run it; if it doesn't confirm, discard and form the next. Never fix on an unconfirmed guess — a cause you can't state in one sentence is still phase 1.

4 — Fix behind a failing test

Write the test that fails because of the bug first (red proves the cause). Make it green with the smallest change at the root, not the symptom; re-run the suite (php artisan test --compact).

5 — Variant sweep + guards

The patched bug is one instance of a class. Search the same pattern at every other call site before closing (codegraph_callers/codegraph_impact, not a grep loop) — a partial fix is as bad as no fix. Where the class has a cheap permanent guard (a PHPStan rule, arch test, lint), propose it — make the class unrepresentable. For security-class bugs, pull the class from docs/security/threat-model.md and scan every boundary that shares it.

Two companion guards before calling the fix done: an over-restrictive-patch check (the fix closes the exact hole without blocking adjacent legitimate behaviour — confirm the happy path still passes), and for security fixes a re-attack of the patched site (re-run the same vector against the fixed location — ~85 % of patches don't survive re-attack; mandatory, not optional; the sweep finds the class elsewhere, re-attack re-checks here).

Proof gates — every lane, no exceptions

Red genuinely failed for the bug's reason — capture the pre-fix red run; a test that would have passed anyway proves nothing.
The repro becomes the regression test — committed, pinning the bug shut.
Mutation kills the mutant on the changed line(s) (pest --mutate scoped to the diff) — the test exercises the fix, not just the line.
Symptom-gone, not just CI-green — re-run the original reported repro.
PR body = RCA report-card — repro · root cause (one falsifiable sentence) · fix + why root-not-symptom · proof (red→green / screenshots) · confidence + riskiest assumption · Closes #NN.

Escalation ladder (either mode, either lane)

Stop and surface — interactive: ask; autonomous: Ready for Human + a structured stuck-comment (tried · diagnosis · candidate fixes · the one decision needed) — when any fires:

3 strikes — three failed fixes at one layer: the model is wrong; question the architecture, don't patch harder. Count test-authoring failures separately from fix failures — independent difficulties.
A genuine decision-class fork — product/UX, data-model, ambiguous intended behaviour, anything security-relevant. Mechanical hardness never escalates; real choices always do.
Can't reproduce (phase 0) → Needs Info + the specific missing fact.
Scope-creep tripwire — a fix reaching beyond the bug's blast radius (refactor, unrelated defect en route): stop, file the extra as its own issue, confirm before widening. The diff stays bug-sized.
Protected surface (autonomous mode) — always.

After the fix

Investigation is free; the fix is a code change — interactive mode confirms the diff; autonomous mode's confirm is exterminate's merge gate (this engine never merges). Offer find-learnings when the root cause was non-obvious. Suspected flaky/infra failure? Re-run to confirm, file/link a tracking issue, quarantine with a linked TODO — never paper over flakiness.

debug ​

Debug ​

Two modes, one engine ​

Phase 0 — Reproduce-first (the router) ​

The five phases (deep lane) ​

1 — Investigate ​

2 — Contrast working vs broken ​

3 — Hypothesis (this is the skill) ​

4 — Fix behind a failing test ​

5 — Variant sweep + guards ​

Proof gates — every lane, no exceptions ​

Escalation ladder (either mode, either lane) ​

After the fix ​