Skip to content

check-reasoning

check-reasoning

checkreadhands-off

Use this when: a plan or diff needs red-teaming before you commit

Problem it solves — You approve your own work too easily. This removes author-bias with an independence gate — a reviewer that sees what you did and what the project says is right, but not why you did it — so it bites instead of rubber-stamping.

Used in workflows: Idea → shipped · QA PR & Learn · Health Check

Check reasoning (devil's-advocate pass)

Anyone approves their own work too easily — me, or you. This skill removes the author bias: the verdict comes from a reviewer that never saw why you did it, only what you did and what the project says is right. The independence gate is the skill — review the artifact in the same breath you wrote it and you'll defend it; dispatch a blind reviewer and it bites.

The independence gate (required)

When the artifact under review was produced in this conversation, do not judge it yourself. Dispatch the blind-reviewer agent (Agent tool, subagent_type: blind-reviewer; fall back to general-purpose/Explore if that agent isn't present) and give it only:

  • the artifact — the plan text, or the diff (git diff <base>...);
  • the standards — CLAUDE.md, .claude/rules/*, docs/adr/*, CONTEXT.md;
  • the criteria below — plus, where the change's blast radius warrants, the attack personas from the adversarial lens deck (docs/agents/lens-deck-adversarial.md: the attacker · the maintainer-in-six-months · the on-call-at-3am · the cost-conscious · the misuse case · the future change), to widen the attack surface past the fixed criteria.

Withhold your reasoning, your justifications, and your confidence. The reviewer forms its own verdict from artifact + standards alone, then reports back. You relay its findings — you do not overrule them to defend your work.

Orchestration (opt-in, big blast radius). When several personas apply, dispatch one blind reviewer per persona in parallel (same artifact + standards, one persona each) per docs/agents/orchestration.md, and merge their verdict tables per docs/agents/deterministic-merge.md — a FAIL from any reviewer stands; never average verdicts away. One persona, or an everyday change, stays a single reviewer.

Two modes

Plan critique — run BEFORE issues are filed

Red-team the plan while it's still cheap to change. Per criterion, PASS/FAIL with the specific weakness:

  • Problem fit — does this solve the stated problem, or an adjacent easier one?
  • Scope — smallest thing that works, or speculative generality? (premature abstraction, gold-plating)
  • Decomposition — are the slices independently shippable, or secretly coupled?
  • Data model / migrations — reversible? backfill-safe? deploy-window aware?
  • Standards — does it respect ADRs (e.g. DB-driven scheduling, import ledgers), the DTO layer, the Todoist priority mapping?
  • Failure & idempotency — what happens on API error / partial run / re-run? (Tempo's sources must be idempotent)
  • Testability — can each slice be proved with mocked externals, happy and sad path?
  • Unknowns — any decision the plan hand-waves that will stall execution?

Altitude calibration — don't let the standards become a status-quo veto. The independence gate removes author bias but can introduce status-quo bias: a reviewer scoring against the current rulebook will defend it. Here the standards are the floor, not the ceiling. A plan that deliberately spends more, or relaxes a convention, for non-linear upside is a trade-off to surface — not an automatic FAIL. Distinguish gold-plating (cost, no payoff) from a deliberate ahead-of-the-curve bet (cost, exponential payoff), and don't demand proven pain for the latter — judge it on expected value. (The diff critique below keeps rule-conformance strict; this leniency is for the plan/approach altitude only.)

A FAIL is a gate: fix the plan or consciously accept the risk before build-sprint touches it.

Diff critique — run BEFORE committing consequential code

Per criterion, PASS/FAIL with file:line evidence and a concrete fix:

  • Correctness — logic errors, edge cases, off-by-ones, inverted conditions.
  • State & concurrency — races, ordering, partial writes, queue/async assumptions.
  • Security$hidden on encrypted attrs, validated() not $request->all(), no leaked credentials (see .claude/rules/security.md).
  • Data integrity — N+1s, migration safety, no Todoist data duplicated in the DB.
  • Standards — DTOs over arrays, FormRequest validation, Wayfinder for FE→BE, generic App naming, the priority/label conventions in fyi-tempo-domain.
  • Error handling — graceful API failure, no silent skips (jobs notify + fail, never swallow).
  • Simplicity — could this be smaller? dead code, needless layers, what-comments.

Output

A binary table — criterion · PASS/FAIL · evidence (file:line) · suggested fix. No percentage scores; a thing either holds or it doesn't. End with the single gating question: ship, or fix first?

Where it sits

  • It is not qa-tests (that finds missing tests via ZOMBIES/CORRECT) — run both before a serious PR; they catch different things.
  • It is not qa-code (does the diff obey our documented rules + architecture) — that's a conformance pass; this red-teams whether the decision/approach is sound. Conforms answers "does it follow the rules", reasoning answers "is it the right call".
  • It is not /code-review (reuse/simplify cleanup) — this is adversarial correctness + standards.
  • It does not lower or touch the composer ci:check gate; it's the human-judgment layer in front of it.

How this gets triggered

Hooks nudge you at the two boundaries: after a green composer ci:check (before commit → diff critique) and before issue creation (gh issue create / issue_write → plan critique). The nudge only reminds; you decide whether the change is consequential enough to gate. For everyday changes, skip it.