Skip to content

qa-skills

qa-skills

qareadhands-off

Use this when: you want the skill set audited for dupes / dead skills

Problem it solves — The skill catalog accretes near-duplicates and dead skills. This audits the set — every skill on-taxonomy, contract declared, no overlaps, nothing unused — and routes prune/improve candidates through vet.

Used in workflows: Harness health Check

QA skills (audit the set)

The skill set rots the same way a backlog does — overlaps creep in, packs stop fitting, gaps open where you keep doing a thing by hand. This audits the set (not one skill — that's build-claude-skill), across five lenses, and acts only on what you pick. A skill you never reach for is debt; curated beats maximal (we cut 30 → 19 once already for exactly this reason).

First: classify — owned vs managed

grep -l 'author:' .claude/skills/*/SKILL.md. Skills with metadata.author (e.g. laravel) are Boost-managedboost:update owns them, so they are read-only here: you may note "we don't use this" but never prune, rename or merge them. Everything author-less is owned and in scope. Tag each owned skill by the verb taxonomy (the table in build-claude-skill: six workhorse verbs + incidental prefixes + the allow-listed standalones).

Then: the deterministic floor. Run php artisan harness:check before any LLM judgment — it asserts the mechanical invariants (name==dir, verb prefix, line cap, dangling refs) and --fix repairs the safe ones. The lenses below spend attention only on what the command can't decide.

The five lenses — this is the skill

Listing skills is setup; the value is the judgement in each bucket. Gather evidence per finding.

  1. Prune — an owned skill that is:
    • redundant — overlaps another's job (and which one wins);
    • poor-fit — off-stack / off-mission (how the design packs were);
    • stale — references removed code, a renamed skill, or a dead convention;
    • unused debt — you never reach for it.
  2. Improve / merge — hold each to docs/agents/skill-quality.md + build-claude-skill (the mechanical name/prefix/cap checks are the floor's job — judge what it can't):
    • description collision — two skills whose descriptions auto-invoke on the same phrase. This is the #1 quality bug: it makes activation a coin-flip. Disambiguate the descriptions or merge the skills.
    • weak trigger — missing Use when… cues or NOT cases; description + when_to_use over the 1,536-char budget (clips in the listing — silent routing loss).
    • frontmatter drift — no explicit model: (the build-claude-skill rubric); prose doing a field's job (paths for path-scoped loading, argument-hint for args) — field list in build-claude-skill/references/frontmatter.md.
    • merge two thin overlapping skills into one; split one bloated skill that does two jobs.
    • duplicated contract — the same rule / recipe / contract inlined in ≥2 skills (prose that will drift). Lift it into one on-demand docs/agents/*.md and point each skill at it + its own deltas — the orchestration.md / deterministic-merge.md precedent (docs/learnings/2026-06-11-extract-shared-contract-to-one-referenced-doc.md).
    • judgement in a mechanical merge — a fan-out skill (ideate, an orchestrating sweep) that merges its helpers' output by model judgement instead of mechanically. The merge must be flatten / strip-attribution / verbatim-dedup only; ranking & selection are a separate, blind step (docs/agents/deterministic-merge.md).
  3. Incorporate (external scan) — dispatch scanner to sweep current catalogs (awesome-claude-skills, hesreallyhim/awesome-claude-code, Obra/superpowers, the official Anthropic skills, directories like claudeskills.info) — it returns the shortlist and keeps the scrape noise out of this thread. For each candidate ask: does it fill a real Tempo gap? If yes, propose it standalone or merged into an existing skill. Prefer an owned adaptation to convention over vendoring a copy — a verbatim external skill drifts from its source and can collide with boost:update; lift the idea, write it the Tempo way (see build-claude-skill). Most catalog skills won't fit — say so rather than padding the set.
  4. Gaps — what do you do manually and repeatedly with no skill? Cross-check the workflow loop and CLAUDE.md Phase-1 scope. Propose new owned skills (creation routes through build-claude-skill).
  5. Orchestration — audit the sweep orchestrators (check-everything and qa-everything), not just the individual skills:
    • Category — is each check- skill in the right bucket (Work / Thinking-Planning / Brain / Housekeeping), and is anything missing from either sweep that should be in it?
    • Run order is dependency-aware, so the smell to catch is a step running before the thing whose output it should act on: a context/skill trim before the audit that adds or tweaks what it trims; a conformance or test pass before the fixes it's meant to judge; an optimiser before the code it optimises lands. Walk each order and ask of each step "does everything this depends on run earlier?" — flag any inversion and name the correct slot. (build-claude-skill places a new skill once, at creation; this re-checks the whole order as the set evolves.)

Workflow — surface, then act

  1. Classify (owned vs managed; by taxonomy).
  2. Run the five lenses; for each finding capture what + why + evidence (the overlapping skill, the colliding phrase, the catalog link, the manual task that recurs, the out-of-order step). Fork the heavy reading: lenses 1–2 mean digesting every owned SKILL.md — dispatch that to one read-only agent (brief: the two lenses' criteria + the quality bar) and keep only its findings list here; lens 3 already forks to scanner. Lenses 4–5 stay in-thread — gaps needs this session's workflow memory, and the orchestrator files are two short reads.
  3. Present the five buckets — Prune / Improve+Merge / Incorporate / Gaps / Orchestration — and AskUserQuestion (multi-select) which to action. Never delete, merge, or add unprompted — pruning a skill is destructive, adding one spends context budget. It's fine to conclude the set is tight — say so.
  4. Apply only the confirmed: prune (git rm -r the dir + update every cross-reference, the way a rename does), improve (edit in place), incorporate/add (via build-claude-skill).
  5. Report what changed, and what was left with the reason.

Where it sits

  • Not build-claude-skill — that authors or renames one skill to convention; this audits the whole set and decides which skills should exist. They hand off: this one finds the work, that one does the creation.
  • Not find-untracked-work (app feature/backlog gaps → issues). This is the tooling layer.
  • Pairs with qa-philosophy: that keeps the foundational docs lean and current; this keeps the skills lean and current.

How this gets triggered

Invoke directly — "audit my skills", "can I prune anything", "what skills am I missing". Description-driven, not hook-fired; a sensible cadence is every so often as the set grows, or whenever a session adds two or three skills (like this one). Context-heavy — prefer a fresh session (the external scan + five-lens pass can overflow a long session's window).