Harness glossary

The terms we use consistently when talking about the harness — the authored Claude Code layer in .claude/ (and its supporting docs). One place to pin what each word means so we don't drift. This is about the tooling, not the Tempo app domain — for app/domain terms (Todoist, Task Helper, the plan…) see CONTEXT.md.

The shape

North star

A harness doesn't make the model smarter; it wraps the model in a closed loop: the model acts, the system executes the action and measures the real result, and that ground truth feeds back so the next step self-corrects. Leverage comes from feedback and grounding, not raw IQ — everything below serves that loop.

Debug the harness first

Corollary of the north star: the model is ~10% of the agent, the harness ~90%, so most agent failures are configuration failures, not model defects — a missing tool, a loose rule, a forgotten guardrail, context bloat. When the agent misbehaves, suspect the harness (the five pillars) before blaming or swapping the model. qa-harness / find-harness-improvements are the tools for it.

Harness

The authored layer on top of Claude Code's runtime: context + tools + memory + guardrails + specialists. We shape this layer; we don't rebuild the runtime (the query loop, permission engine, compaction) underneath it.

The five pillars

The five things the harness is made of: context, tools, memory, guardrails, specialists. Used as the axes for qa-harness.

Spine

The order things must come in; each is a dependency, not a preference:

permission before capability — don't grant power before the guardrail that contains it;
rollback before autonomy — only let it run unattended once mistakes are cheaply undoable;
verification before delivery — measure the result before it ships (the closed loop's gate);
context budgets before long dialogue — bound what loads before conversations sprawl;
lifecycle before multi-agent — get one agent's start→clean-exit right before coordinating many;
institutions before proficiency — rules and memory that outlast a session beat one-off cleverness.

Altitude (L0 / L1 / L2)

How wide a thing operates. L0 = one artifact (a diff, a bug). L1 = a set within one pillar (the skill set, the test suite, the always-on docs). L2 = the whole harness across all pillars (qa-harness, find-harness-improvements).

Skills

Skill

An invocable SKILL.md under .claude/skills/<name>/. The directory name is the command (no alias field).

Verb taxonomy

The prefix that names what a skill does. Six workhorse verbs: build- (build the thing it names — scaffold or implement; covers executors build-sprint / build-epic, while issue creation is to-issues), qa- (audit a concrete artifact), optimise- (faster/leaner), find- (divergent discovery), source- (convergent acquisition), check- (meta: reasoning + orchestration). Plus incidental prefixes be- / tidy- / fyi-, and standalone single-word skills (vet, debug, brainstorm, wrap-up).

Contract

The four things every skill declares: task class (the verb), allowed tools, direct vs fork (runs in-thread or dispatches a sub-agent), and the verifiable artifact (what's true once it's done).

Artifact

The concrete thing a skill produces or operates on, and that you can check afterwards: a diff, a plan, a spec, a test, a doc, a filed issue. "Audit a concrete artifact" (qa-) vs operate on the system (qa-harness).

The heart

The one step that actually carries a skill, marked "This is the skill." Where to spend disproportionate effort. (Reference/fyi- skills are deliberately flat instead.)

Lens

A sub-check folded into a skill rather than spun out as its own skill (e.g. qa-code's security lens, optimise-context's removal-test lens). The lighter alternative to a new skill.

Orchestrator

A skill that sequences other skills (check-everything, qa-everything, qa-harness). It owns no fixes itself — every change stays behind a sub-skill's confirm.

Synthesise (not relay)

An orchestrator (or an agent's caller) digests sub-findings into a conclusion; it doesn't dump them raw.

`(formerly …)`

The rename trail kept at the start of a description so old names still resolve.

Owned vs Boost-managed

Owned skills are hand-authored (no metadata.author), ours to edit/rename. Boost-managed ones have metadata.author and are regenerated by boost:update — never hand-edit or rename them.

Guardrails & behaviour

Surface-then-confirm

Propose candidates, let the user pick, never act unprompted. The default for anything that writes.

Tiered (surface-then-confirm)

Gated by irreversibility: pure reads run free (no confirm tax); writes / deletes / filing issues / pushes always confirm.

Onion-ordered guardrails

When a skill stacks several guards, compose them in order like an onion: the cheapest / most-general check outermost, the irreversible-action approval innermost. Same gating-by-irreversibility as tiered surface-then-confirm, made explicit as a composition order — the cheap, general guard rejects before the expensive or consequential one runs. Sharpens tiered confirm; doesn't replace it.

Definition of Done (DoD)

A change is done only when composer ci:check is green; never lower a threshold, never merge "done with known issues" (file them instead).

Set the bar at the eval, not the demo

One green run proves capability once; reliability needs an eval (rubric-scored, repeatable). A test covers the deterministic part (input → output); an eval covers model output — both the result and the trajectory (the tool calls / reasoning path). Tempo evals the harness (harness:check + the pairwise-judge gate, docs/learnings/2026-06-04-pairwise-judge-eval-as-gate.md); product-output evals (e.g. the daily nudge) are a known gap.

Circuit-breaker

Stop repeating a thing that keeps failing (3 strikes), and step up to question the approach rather than retrying harder.

Behaviour mode (`be-`)

A standing instruction on how to work, on until switched off (be-complete, be-caveman, be-teacher, be-radical) — not a one-off task.

Removal test

To decide if an always-on doc section earns its place: delete it; if behaviour doesn't change structurally, it was decoration → cut or move on-demand.

Prefactor

"make the change easy, then make the easy change": reshape first to make the real change simple, but only as far as this change needs.

Discovery & judgement

vet

Judge an idea/skill/tool: first its worth (real gap? fit? risk?), then its placement. The universal front-door when you're unsure something deserves a skill.

Placement

Worth

The adopt / adapt / skip verdict, with the reason. vet does this without rubber-stamping.

Divergent vs convergent

find- is divergent ("what am I missing?"); source- is convergent ("get the best known X for a need I can already name"). Both end at vet.

Independence gate

A verdict from a reviewer that never saw the author's reasoning, so it can't bless the work (check-reasoning, the blind-reviewer agent). The defence against rubber-stamping (approving your own work too easily).

Rubber-stamp

Approving your own (or whoever-asked's) work too easily, without real scrutiny. What the independence gate and vet's refuse-to-rubber-stamp rule exist to prevent.

Hard gate

A point past which you may not proceed until a condition is met (e.g. brainstorm won't let building start until the spec is approved).

Specialists (agents)

Agent / sub-agent / specialist

A definition in .claude/agents/ dispatched for a sub-task. Reason for one is isolation + independence, never throughput.

Hybrid pattern

The agent gathers/verifies read-only in its own context, returns findings; the main thread confirms and applies. (Exception: worktree-builder, whose writes are quarantined to a worktree and reviewed on integration.)

Phase-gated

A skill that can use an agent degrades to inline behaviour if that agent file isn't present, so no earlier-phase skill hard-depends on a later-phase agent.

No orphans

An agent that aborts/crashes leaves its worktree clean and removable, never a half-applied tree.

Orchestration tax / model routing

Route heavy reasoning to a capable model and routine work to cheap workers (docs/agents/orchestration.md); the "tax" is the orchestrator's own review + integration overhead you pay so cheap workers stay safe — worth it for the ~50–75% worker-cost cut. Conductor vs orchestrator names the two working modes: conductor = real-time, in-thread, keystroke-level (exploring unknown code); orchestrator = async hand-off-and-review (well-specified work — migrations, test generation). Maps onto our interactive vs front-load → hands-off split.

Context & memory

Always-on (footprint)

Tokens paid on every session/AI call: the authored part of CLAUDE.md, .claude/rules/, every skill's description, CONTEXT.md/SOUL.md. The thing optimise-context trims.

On-demand

Loaded only when relevant (skill bodies, path-scoped rules) — cheap, so detail lives here.

Static ↔ dynamic placement (decide at add-time)

Every piece of context is either always-on (static — paid every turn) or on-demand (dynamic — paid only when loaded). The placement is a first-class decision made when you add the context, not only when optimise-context trims it later — context as code, justified in the PR. Default to dynamic; justify static. The progressive-disclosure ladder is that dynamic default in action: a skill's description (static, tiny) → its SKILL.md body (dynamic, on match) → its docs/agents/* heavy reference (dynamic, only when needed) — one agent carries dozens of skills but pays only for the one it fires. The six context types an agent carries — instructions, knowledge, examples (few-shot — the type we under-use), memory, tools, guardrails — each get this same static-or-dynamic call.

Cache-stability

Keep the static always-on prefix constant; never interpolate volatile/per-task data into the base, or it busts the prompt cache every turn.

Learning

A durable, dated lesson in docs/learnings/ (a footgun, a non-obvious cause, an emerging convention). Captured by find-learnings; promoted to a rule by graduate-learnings (find-learnings --refresh).

Rule

An always-on convention in .claude/rules/ (e.g. code-quality.md). A graduated, standing version of a learning.

Backlog model

The four layers work is tracked in — all native GitHub primitives (see docs/brainstorms/2026-06-04-backlog-organisation-model.md):

Milestone

A major delivery stage; the top layer, made of many epics. A native GH Milestone. The set is an explicit, maintained order — currently Foundational → Launch → post-launch buckets — where the early stages are dated/scheduled and the post-launch buckets are deliberately undated ("do this after launch", split into a few buckets by urgency). Order follows the maintained list, never inferred from due/creation dates; "earliest milestone" = first in that sequence with open work.

Epic

An evolving theme/workstream you keep adding to (the harness, the app, deployment) within a milestone; open-ended. No git branch — a tracking parent only; closed by build-epic on completion, never auto-closed by a trigger (ADR-0005). Native Issue Type Epic.

Sprint

A bounded push you ship then verify (manual testing); bounded by intent, not time — can be long. Ships on its own sprint/<slug> branch → its own PR to main, independently of sibling sprints; closes natively on that merge. Issue Type Sprint.

Task

One unit of work; closes when its work merges into the sprint branch (the close-sprint-tasks Action — native close only fires on main). Issue Type Task.

Membership / dependency / ordering

Membership is the native sub-issue tree (Epic ⊃ Sprint ⊃ Task); dependencies are native blocked-by; ordering lives on the Projects v2 board (never baked into a name).

Branch flow & completion

Task → sprint/<slug> branch → Sprint PR → main; the epic is a tracking parent, never a branch. Per-layer issue close is pinned in docs/adr/0005-sprint-epic-branch-flow-and-completion.md.

Area

An Area label (App / Harness / App + Harness, set by triage), orthogonal to the layers — filters all harness (or app) work across epics. (Formerly the native org Area Issue Field; moved to labels — see docs/agents/triage-labels.md.)

Readiness / Status

A Projects v2 Status field (Needs Triage → Needs Info → Ready for Agent → Ready for Human → In Progress → Done, + Wontfix), set by triage.

Vertical / horizontal move

vertical = promote/demote a layer (change Issue Type + parent + milestone); horizontal = re-home within a layer (reparent to a sibling, or change milestone). organise performs these.

Process

Shakedown

Running the freshly-built skills on the harness/repo itself as their first real test (the end-of-build self-audit).

Finding · severity

A qa- output line: the issue + file:line + critical / major / minor (critical/major gate a PR; minor can ride or defer).

Selector

How build-sprint picks its batch: an explicit issue list, a label, or a sprint; build-epic executes a whole epic (its sprints, sequentially).

Sweep

A pass over many things in order (e.g. check-everything over the in-scope checks), pausing per step.

Front-load → hands-off

The preferred build rhythm with agents: resolve all ambiguity / grilling / decisions and document them up front, then execute autonomously without stopping to ask. Enacted by build-sprint's PLAN→EXECUTE split and brainstorm's hard gate; to-issues lets you choose to front-load each issue's spec now or defer it to action-time.

QA Plan

A single live GitHub issue (always titled QA Plan, labelled plan) that orders the open PRs for a sprint/epic with per-PR state, files to watch, what Claude can action, and a merge recommendation. Built/refreshed from fresh GitHub data by build-qa-plan (never trusts the old body); the doable items are executed by build-from-qa-plan. It's a status board, not a snapshot.

Dev Plan

The work-in twin of the QA Plan: a single live GitHub issue (always titled Dev Plan, labelled plan) that surfaces what to work on next — the open issues in the earliest open milestone, grouped Epic → Sprint, ordered by readiness + dependencies, with a top health line counting orphans (issues with no home yet). Built/refreshed from fresh GitHub data by build-dev-plan; hands a chosen sprint to build-sprint/build-epic and the orphans to organise. Reports + flags only — never places.

Brainstorm (double diamond)

The brainstorm skill runs an idea through two diamonds — problems then solutions — across five gated stages. The stage names are pinned so the wording doesn't drift.

Scratchpad

The pre-divergent framing stage: free-dump, then converge on the goal frame before any problem-naming. Exits only on the mandatory Scratchpad→Discover gate.

Goal frame

Scratchpad's deliverable and the layer above the diamonds (Goal → Problems → Solutions): one solution-free primary goal + optional explicitly-ranked secondary goals + constraints + non-goals. Becomes the top of the convergence rubric — the measure every problem and solution is scored against — so it's the lens, not a solution. Mandatory gate; red-teamed (check-reasoning) before exit.

Discover (= Problem Exploration) — the first divergent stage: name every problem, unranked and unscoped. Widening the problem space; the gates narrow it later.
Define (= Problem Curation) — the first convergent stage: pick the problems this work solves and park the rest. Narrows by the convergence rubric + a check-reasoning red-team on the problem set.
Develop (= Solution Exploration) — the second divergent stage: generate many solution paths for the in-scope problems, unranked. Build-spikes (throwaway exploration) live here.
Deliver (= Solution Curation) — the second convergent stage: narrow to one solution by the rubric + the blind chooser, run the mandatory coverage gate, then hand off. A check-reasoning red-team lands on the solution here.
The plain names are the everyday handle (Exploration = the divergent widen, Curation = the convergent narrow); the canonical ones (Discover/Define/Develop/Deliver) are what the stage: frontmatter + the gate use — banners show both (DEFINE · Problem Curation).

Gate token

The exact UPPERCASE word the maintainer types to open a gate (NEXT to advance a stage, BUILD to exit). Distinct from ordinary prose so consent is unambiguous; flips the matching frontmatter gates.* flag (which is the brainstorm's state — no second store).

Build-spike

A throwaway, maintainer-approved build to test feasibility only during Develop, isolated under the brainstorm's spikes/, reporting "works / doesn't" (never "therefore pick this"). Registered + disposed on close; effort spent scores nothing (the sunk-cost guard).

Reopen

The defined step-back move: when new info invalidates an earlier stage, ask the maintainer first, then return to the named stage and log the trigger.

Mini-diamond

A small, targeted re-diverge scoped to one uncovered gap (a problem whose only solution was rejected, or new info), rather than reopening the whole brainstorm. Triggered by the coverage gate or a Reopen.

Review surface

The standalone, JSON-driven HTML page (review-verdict.html / review-picker.html templates) a brainstorm fills and saves beside its doc, so a comprehensive divergent menu is reviewable off the chat wall; reactions export to review-state.json and fold back into the doc.

Harness glossary ​

The shape ​

North star ​

Debug the harness first ​

Harness ​

The five pillars ​

Spine ​

Altitude (L0 / L1 / L2) ​

Skills ​

Skill ​

Verb taxonomy ​

Contract ​

Artifact ​

The heart ​

Lens ​

Orchestrator ​

Synthesise (not relay) ​

(formerly …) ​

Owned vs Boost-managed ​

Guardrails & behaviour ​

Surface-then-confirm ​

Tiered (surface-then-confirm) ​

Onion-ordered guardrails ​

Definition of Done (DoD) ​

Set the bar at the eval, not the demo ​

Circuit-breaker ​

Behaviour mode (be-) ​

Removal test ​

Prefactor ​

Discovery & judgement ​

vet ​

Placement ​

Worth ​

Divergent vs convergent ​

Independence gate ​

Rubber-stamp ​

Hard gate ​

Specialists (agents) ​

Agent / sub-agent / specialist ​

Hybrid pattern ​

Phase-gated ​

No orphans ​

Orchestration tax / model routing ​

Context & memory ​

Always-on (footprint) ​

On-demand ​

Static ↔ dynamic placement (decide at add-time) ​

Cache-stability ​

Learning ​

Rule ​

Backlog model ​

Milestone ​

Epic ​

Sprint ​

Task ​

Membership / dependency / ordering ​

Branch flow & completion ​

Area ​

Readiness / Status ​

Vertical / horizontal move ​

Process ​

Shakedown ​

Finding · severity ​

Selector ​

Sweep ​

Front-load → hands-off ​

QA Plan ​

Dev Plan ​

Brainstorm (double diamond) ​

Scratchpad ​

Goal frame ​

Gate token ​

Build-spike ​

Reopen ​

Mini-diamond ​

Review surface ​