qa-harness

qa-harness

qareadhands-off

Use this when: you want the whole harness audited as a system

Problem it solves — A skill set that grew organically can stop cohering. This audits the harness as a system — all five pillars and the seams between them — asking whether the whole still hangs together, not whether one change is clean.

Used in workflows: Harness health Check

QA harness (the system, all five pillars)

check-everything sweeps a change. This sweeps the harness itself — the authored layer on top of Claude Code's runtime. Different question, slower cadence: run it when the skill set / rules / agents have grown enough that the system might no longer hang together. Context-heavy — prefer a fresh session.

The five pillars — audit each, then their seams

Context — always-on footprint lean and cache-stable? (optimise-context) Does each doc earn its place? Anything volatile in the stable base? Fresh Session Test lens — could a cold agent, given only the repo (and the trackers it points to), answer the five orientation questions (what is this · how to run · how to verify · what's in progress · what's next)? A "no" on any is a context gap — the inverse of the leanness check.
Tools — the skill set: every skill on-taxonomy, contract declared, no near-duplicates, no dead skills, orchestrators current? (qa-skills) MCP tools risk-tiered?
Memory — docs/learnings/ index within its cap, entries still true to the code, a decay path for stale ones? (find-learnings --refresh owns graduate/archive.)
Guardrails — tiered surface-then-confirm honoured (reads free, writes gated)? the CI gate / DoD intact? circuit-breakers where loops can run away? Confirm-policy lens (the judgment harness:check deliberately leaves here): does any skill describe an outward write — edits a canonical doc, files an issue, saves to memory, pushes — without a confirm gate? (grill-with-docs was exactly this.)
Specialists — agents earn their isolation (independence / clean context, not throughput)? read-only where they should be? no orphaned-worktree risk? orchestrators synthesise agent findings, not relay them? frontmatter to convention — explicit model: and role color per the build-claude-skill rubrics, skills a body names preloaded via skills:?

Then the seams — this is the skill: a rule that contradicts a skill; two skills that overlap; a skill that references an agent that isn't phase-gated; the taxonomy doc vs the actual roster; a definition that assumes a runtime capability that doesn't exist (verify against the platform docs or a live probe — the ideator agent promised a subagent fan-out the runtime forbids, and degraded silently). The system-level bugs live between pillars, not inside one; spend the judgment here, not on re-walking the per-pillar checklists.

How it runs

Start with the deterministic floor: run php artisan harness:check. It asserts the mechanical invariants (name==dir, no dangling skill refs, valid verb prefix, agent read-only tools, skill/agent shape) and fails on drift — so the boring violations are caught for free before you spend LLM attention. --fix repairs the safe ones. Then you do the judgment the command can't.

Fork the pillar gathers. Dispatch one read-only agent per pillar in a single parallel batch (each brief: that pillar's questions above + the standards to judge against), returning status + evidence + recommended move — the audit is read-only throughout, so the expensive reading happens in disposable context windows, not this thread (the same context-separation reason worktree-builder isolates). Then synthesise the five returns into one report in-thread — never just relay them. A pillar that needs the L1 skill's full machinery (qa-skills, optimise-context, qa-philosophy) still routes through it; fixes route back through the owning skill's confirm gate.

Benchmark against the harness-engineering spine: permission before capability · rollback before autonomy · verification before delivery · context budgets before long dialogue · lifecycle before multi-agent · institutions before proficiency.

Output

A five-pillar health report: per pillar — status (sound / drifting / gap) · the evidence · the recommended move (and which skill owns the fix). End with the top 3 cross-pillar risks. It's a valid result to report "coheres" — say so.

Where it sits

L2 — operates on the harness; qa-skills/optimise-context/qa-philosophy are L1 (one pillar each); qa-code/debug are L0 (one artifact).
Not check-everything — operational sweep of a change vs architectural sweep of the system. Different axis, different cadence.
Pairs with find-harness-improvements (which looks outward for what to add); this looks inward at whether what's here holds.

qa-harness ​

QA harness (the system, all five pillars) ​

The five pillars — audit each, then their seams ​

How it runs ​