Appearance
qa-harness
qa-harness
qareadhands-off
Use this when: you want the whole harness audited as a system
Problem it solves — A skill set that grew organically can stop cohering. This audits the harness as a system — all five pillars and the seams between them — asking whether the whole still hangs together, not whether one change is clean.
Used in workflows: Harness health Check
QA harness (the system, all five pillars)
check-everything sweeps a change. This sweeps the harness itself — the authored layer on top of Claude Code's runtime. Different question, slower cadence: run it when the skill set / rules / agents have grown enough that the system might no longer hang together. Context-heavy — prefer a fresh session.
The five pillars — audit each, then their seams
- Context — always-on footprint lean and cache-stable? (
optimise-context) Does each doc earn its place? Anything volatile in the stable base? Fresh Session Test lens — could a cold agent, given only the repo (and the trackers it points to), answer the five orientation questions (what is this · how to run · how to verify · what's in progress · what's next)? A "no" on any is a context gap — the inverse of the leanness check. - Tools — the skill set: every skill on-taxonomy, contract declared, no near-duplicates, no dead skills, orchestrators current? (
qa-skills) MCP tools risk-tiered? - Memory —
docs/learnings/index within its cap, entries still true to the code, a decay path for stale ones? (find-learnings --refreshowns graduate/archive.) - Guardrails — tiered surface-then-confirm honoured (reads free, writes gated)? the CI gate / DoD intact? circuit-breakers where loops can run away? Confirm-policy lens (the judgment
harness:checkdeliberately leaves here): does any skill describe an outward write — edits a canonical doc, files an issue, saves to memory, pushes — without a confirm gate? (grill-with-docswas exactly this.) - Specialists — agents earn their isolation (independence / clean context, not throughput)? read-only where they should be? no orphaned-worktree risk? orchestrators synthesise agent findings, not relay them? frontmatter to convention — explicit
model:and rolecolorper thebuild-claude-skillrubrics, skills a body names preloaded viaskills:?
Then the seams — this is the skill: a rule that contradicts a skill; two skills that overlap; a skill that references an agent that isn't phase-gated; the taxonomy doc vs the actual roster; a definition that assumes a runtime capability that doesn't exist (verify against the platform docs or a live probe — the ideator agent promised a subagent fan-out the runtime forbids, and degraded silently). The system-level bugs live between pillars, not inside one; spend the judgment here, not on re-walking the per-pillar checklists.
How it runs
Start with the deterministic floor: run php artisan harness:check. It asserts the mechanical invariants (name==dir, no dangling skill refs, valid verb prefix, agent read-only tools, skill/agent shape) and fails on drift — so the boring violations are caught for free before you spend LLM attention. --fix repairs the safe ones. Then you do the judgment the command can't.
Fork the pillar gathers. Dispatch one read-only agent per pillar in a single parallel batch (each brief: that pillar's questions above + the standards to judge against), returning status + evidence + recommended move — the audit is read-only throughout, so the expensive reading happens in disposable context windows, not this thread (the same context-separation reason worktree-builder isolates). Then synthesise the five returns into one report in-thread — never just relay them. A pillar that needs the L1 skill's full machinery (qa-skills, optimise-context, qa-philosophy) still routes through it; fixes route back through the owning skill's confirm gate.
Benchmark against the harness-engineering spine: permission before capability · rollback before autonomy · verification before delivery · context budgets before long dialogue · lifecycle before multi-agent · institutions before proficiency.
Output
A five-pillar health report: per pillar — status (sound / drifting / gap) · the evidence · the recommended move (and which skill owns the fix). End with the top 3 cross-pillar risks. It's a valid result to report "coheres" — say so.
Where it sits
- L2 — operates on the harness;
qa-skills/optimise-context/qa-philosophyare L1 (one pillar each);qa-code/debugare L0 (one artifact). - Not
check-everything— operational sweep of a change vs architectural sweep of the system. Different axis, different cadence. - Pairs with
find-harness-improvements(which looks outward for what to add); this looks inward at whether what's here holds.