Appearance
find-harness-improvements
find-harness-improvements
findreadhands-off
Use this when: you want ideas to improve the whole harness
Problem it solves — You can't improve a harness you only ever look at from the inside. This researches the broader practice — books, repos, articles — and proposes improvements across all five pillars, each run through vet before anything is adopted.
Used in workflows: Grow the harness
Find harness improvements (L2 research)
The harness should keep learning from how others build harnesses. This goes looking — outward and across every pillar — for what we're missing, then funnels it through the same judge-and-shape pipeline as everything else.
Contract — divergent discovery over the whole field · forks all gathering (scanner for the field, claude-code-guide for the platform), judges in-thread via vet · read-only: the artifact is a vetted candidate list; building routes through brainstorm / to-issues. Finding nothing worth adopting is a valid result — say so and stop.
The flow
- Frame the hunt — pick the divergence angles from the discovery lens deck (
docs/agents/lens-deck-discovery.md→ itsfind-harness-improvementsdomain deck: the five pillars · cost · learning-value · state-of-the-art · removal · demonstrability), or run an open brief ("what's the current best practice?"). More lenses = more breadth + cost. The whole of this session — scanning obra/superpowers, the agent catalogs, the Harness Engineering book — was a manual run of this skill. - Mine our own learnings first (the inward feed) — review
docs/learnings/for recurring harness footguns / drift. Each recurring one is a candidate newharness:checkcheck (if it's mechanically decidable) or aqa-harnesslens (if it needs judgment) — that's how the eval set self-extends from experience (the find-skills dangling-ref trap, the squash-merge trap, write-without-confirm each become a check). Cheaper and higher-signal than the outward scan, so do it before reaching forscanner. - Scan the platform surface — dispatch
claude-code-guideagainst the current Claude Code docs/changelog for capabilities the harness doesn't use yet: new skill/agent frontmatter fields, runtime features, changed limits. Each unused field is a candidate convention (theskills:preload,paths,color, the description budget all arrived this way). Verify every capability claim against the docs or a live probe before adopting — an announcement is not a capability, and timing matters: nested subagents were announced before they shipped (absent from docs + a live test then), but landed for real in Claude Code v2.1.172 (now documented + confirmed by a live probe) — re-check rather than trusting the hype or a stale "it doesn't exist". - Gather in isolation — dispatch the
scanneragent (subagent_type: scanner) to read the field's sources (books, repos, long articles) and return a pillar-mapped digest with anchors, not raw dumps. Keeps the research noise out of the main thread. - Vet each idea — this is the skill: run candidates through
vet— worth adopting for our five pillars? adopt / adapt / skip + risk, then placement (rule | skill+verb+altitude | lens | agent | doc |harness:checkcheck if mechanically decidable). Most ideas should lose — curated beats maximal, even for good ideas; an unvetted adoption is how a harness bloats. - Shape the keepers — hand survivors to
brainstorm(pin down the design) andto-issues(file the issues). This skill finds and judges; it doesn't build. When a sourced idea is later adopted, record it indocs/agents/sources.md(the source · where it landed · how tweaked) so the inspiration is traceable.
Where it sits
- L2 — operates on the whole harness, like
qa-harness. The pair:qa-harnesslooks inward (does what we have cohere?); this looks outward (what should we add?). - Not
qa-skills— that's divergent discovery over skill catalogs; this is divergent over the field of harness engineering (all pillars, any source). - Not
source-a-skill— that's convergent (one known need). Everything here ends atvet, same as the rest of the discovery family.