Appearance
check-reasoning
check-reasoning
checkreadhands-off
Use this when: a plan or diff needs red-teaming before you commit
Problem it solves — You approve your own work too easily. This removes author-bias with an independence gate — a reviewer that sees what you did and what the project says is right, but not why you did it — so it bites instead of rubber-stamping.
Used in workflows: Idea → shipped · QA PR & Learn · Health Check
Check reasoning (devil's-advocate pass)
Anyone approves their own work too easily — me, or you. This skill removes the author bias: the verdict comes from a reviewer that never saw why you did it, only what you did and what the project says is right. The independence gate is the skill — review the artifact in the same breath you wrote it and you'll defend it; dispatch a blind reviewer and it bites.
The independence gate (required)
When the artifact under review was produced in this conversation, do not judge it yourself. Dispatch the blind-reviewer agent (Agent tool, subagent_type: blind-reviewer; fall back to general-purpose/Explore if that agent isn't present) and give it only:
- the artifact — the plan text, or the diff (
git diff <base>...); - the standards —
CLAUDE.md,.claude/rules/*,docs/adr/*,CONTEXT.md; - the criteria below — plus, where the change's blast radius warrants, the attack personas from the adversarial lens deck (
docs/agents/lens-deck-adversarial.md: the attacker · the maintainer-in-six-months · the on-call-at-3am · the cost-conscious · the misuse case · the future change), to widen the attack surface past the fixed criteria.
Withhold your reasoning, your justifications, and your confidence. The reviewer forms its own verdict from artifact + standards alone, then reports back. You relay its findings — you do not overrule them to defend your work.
Orchestration (opt-in, big blast radius). When several personas apply, dispatch one blind reviewer per persona in parallel (same artifact + standards, one persona each) per docs/agents/orchestration.md, and merge their verdict tables per docs/agents/deterministic-merge.md — a FAIL from any reviewer stands; never average verdicts away. One persona, or an everyday change, stays a single reviewer.
Two modes
Plan critique — run BEFORE issues are filed
Red-team the plan while it's still cheap to change. Per criterion, PASS/FAIL with the specific weakness:
- Problem fit — does this solve the stated problem, or an adjacent easier one?
- Scope — smallest thing that works, or speculative generality? (premature abstraction, gold-plating)
- Decomposition — are the slices independently shippable, or secretly coupled?
- Data model / migrations — reversible? backfill-safe? deploy-window aware?
- Standards — does it respect ADRs (e.g. DB-driven scheduling, import ledgers), the DTO layer, the Todoist priority mapping?
- Failure & idempotency — what happens on API error / partial run / re-run? (Tempo's sources must be idempotent)
- Testability — can each slice be proved with mocked externals, happy and sad path?
- Unknowns — any decision the plan hand-waves that will stall execution?
Altitude calibration — don't let the standards become a status-quo veto. The independence gate removes author bias but can introduce status-quo bias: a reviewer scoring against the current rulebook will defend it. Here the standards are the floor, not the ceiling. A plan that deliberately spends more, or relaxes a convention, for non-linear upside is a trade-off to surface — not an automatic FAIL. Distinguish gold-plating (cost, no payoff) from a deliberate ahead-of-the-curve bet (cost, exponential payoff), and don't demand proven pain for the latter — judge it on expected value. (The diff critique below keeps rule-conformance strict; this leniency is for the plan/approach altitude only.)
A FAIL is a gate: fix the plan or consciously accept the risk before build-sprint touches it.
Diff critique — run BEFORE committing consequential code
Per criterion, PASS/FAIL with file:line evidence and a concrete fix:
- Correctness — logic errors, edge cases, off-by-ones, inverted conditions.
- State & concurrency — races, ordering, partial writes, queue/async assumptions.
- Security —
$hiddenon encrypted attrs,validated()not$request->all(), no leaked credentials (see.claude/rules/security.md). - Data integrity — N+1s, migration safety, no Todoist data duplicated in the DB.
- Standards — DTOs over arrays, FormRequest validation, Wayfinder for FE→BE, generic
Appnaming, the priority/label conventions infyi-tempo-domain. - Error handling — graceful API failure, no silent skips (jobs notify + fail, never swallow).
- Simplicity — could this be smaller? dead code, needless layers, what-comments.
Output
A binary table — criterion · PASS/FAIL · evidence (file:line) · suggested fix. No percentage scores; a thing either holds or it doesn't. End with the single gating question: ship, or fix first?
Where it sits
- It is not
qa-tests(that finds missing tests via ZOMBIES/CORRECT) — run both before a serious PR; they catch different things. - It is not
qa-code(does the diff obey our documented rules + architecture) — that's a conformance pass; this red-teams whether the decision/approach is sound. Conforms answers "does it follow the rules", reasoning answers "is it the right call". - It is not
/code-review(reuse/simplify cleanup) — this is adversarial correctness + standards. - It does not lower or touch the
composer ci:checkgate; it's the human-judgment layer in front of it.
How this gets triggered
Hooks nudge you at the two boundaries: after a green composer ci:check (before commit → diff critique) and before issue creation (gh issue create / issue_write → plan critique). The nudge only reminds; you decide whether the change is consequential enough to gate. For everyday changes, skip it.