The `brainstorm` skill

Shape a fuzzy idea into a build-ready spec — before a single line of code.

A vague idea turned straight into code becomes the wrong code, confidently. brainstorm keeps an idea in thinking mode — two diamonds, problems first then solutions — until building it is mechanical, and opens the build gate only when you say so.

NOTE

This page goes deeper than the skills reference entry — it walks the model end-to-end and shows how each piece is wired: the state machine, the blind helpers, the rubric, the gate hook, and the files left on disk.

Why it exists

When a half-formed "wouldn't it be nice if…" goes straight to implementation, the gaps get filled with assumptions. You end up with something that works but solves the wrong problem — and the cost of discovering that has moved all the way downstream into shipped code.

Brainstorm slows the front of the funnel down on purpose. It forces you to name every problem before proposing any solution, widens the option space with independent helpers so you don't anchor on your first idea, and proves every in-scope problem actually has an answer before anything is planned. The thinking is cheap; the wrong code is not.

The second, quieter failure it guards against is a biased answer. You anchor on the idea you've half-built; the agent mirrors your leanings and over-credits effort already spent. Left alone, a brainstorm just rubber-stamps a pre-formed conclusion. The fix is blindness — the load-bearing mechanism described below.

The governing principle

Claude and every helper recommend — they never decide. You make every final call: which problems are in scope, which solution wins, whether to step back a stage, whether to build a spike. The skill is a structured conversation, not an autopilot.

Goal → Problems → Solutions

Above the two diamonds sits one more layer: the goal. Before any problem is named, Scratchpad converges on the outcome the work is for — and that goal becomes the lens every later stage is measured against. A problem is only a problem relative to the goal; a solution is scored on how well it serves the goal, not just whether it's neat.

Naming the goal first is what stops a brainstorm anchoring on a half-built answer. #590 was filed as "protected days" — but a protected day is one solution; the real goal was "help me start a task I keep avoiding." Naming the goal first let other solutions surface (break-it-down, route-to-a-Helper) that a solution-titled brainstorm would have anchored away from. So the layering is Goal → Problems → Solutions, each evaluated against the layer above — and the goal is the top of the rubric.

The double diamond

The work moves through five gated stages, each broken into fine steps. The first diamond is about problems; the second is about solutions. Each diamond widens (Exploration — the divergent pass) before it narrows (Curation — the convergent pass).

NOTE

Each diamond stage carries both names: the canonical one the gate + frontmatter key off (Discover/Define/Develop/Deliver), and the plain one that says what's actually happening (Problem Exploration / Curation, Solution Exploration / Curation). Exploration ≈ diverge, Curation ≈ converge — the plainer pair the maintainer can talk in without losing the under-the-hood term.

Stage	Fine steps	What happens	Lands on
Scratchpad · Goal	`scratchpad` → `review`(goal)	Dump anything — problems, ideas, half-solutions, worries. Claude challenges it, then converges on the goal frame (a solution-free primary goal + constraints + non-goals) and red-teams it. Gated: no goal, no Discover.	`brainstorm.md` + the Goal section, `review-goal-verdict.json`
🔵 Discover · Problem Exploration	`effort` → `explore` → `review`(pool)	Name every problem in the space. Stay unranked; keep solutions and scope for later. The heart of diamond 1.	the problem pool, `review-problems-pool-verdict.json`
🟣 Define · Problem Curation	`curate` → `review`(set)	Pick the problems this work solves; park the rest; name the explicit non-goals. Apply the rubric; red-team the set.	`review-problems.html` + `-state.json`, `-set-verdict.json`
🟢 Develop · Solution Exploration	`effort` → `explore` → `review`(pool) → `viability`	Generate many solution paths, each tagged with the problem(s) it solves, then triage feasibility. The heart of diamond 2.	`review-solutions.html` + `-pool-verdict.json`, `viability-state.json`
🟠 Deliver · Solution Curation	`curate` → `review`(pick) → `coverage`	Rank by rubric + blind chooser, you pick one, red-team the pick, prove coverage, gate to `BUILD`, hand off.	`review-decisions.html` + `-state.json`, `-pick-verdict.json`

Each diamond emits a banner as the first line of its reply (e.g. ━━ DISCOVER · Problem Exploration ━━ …) so the live mode stays visible. The crucial discipline: the mechanics of each fine step are re-read from its own file on entry — never run from memory (next section).

Run it from the files — re-read every step

The single biggest failure in practice was the agent skipping steps — the lens prompt, the review surface, the HTML export — and the maintainer rubber-stamping the skips. The diagnosis: the launch-time procedure is bespoke (not generic "double diamond" knowledge), and over a long brainstorm it gets buried and compressed out of context, so the agent reconstructs steps from memory and drops the awkward ones. The fix re-injects the procedure just-in-time:

On launch Claude reads playbook.md + decks.md and posts a short workflow brief — the steps, and exactly where you'll be asked to act (pick lenses, choose a review depth, review + export on a page, type the tokens) — so a skipped step is visible to you, not just to it.
On entering any fine step — first time or re-entry — Claude's first action is to Read that step's own stages/<step>.md file, then post its phase contract and quote a unique PHASE-TOKEN baked into that file (e.g. scratchpad-4B1E9A, viability-9D5E10) before doing the work. The token is proof-of-read: it can only be quoted by a turn that actually opened the file, so a step run from recall is visible by the missing/wrong token.

Why a per-step file, not one big skill body

A long SKILL.md read once at launch is exactly what compresses away. Splitting the mechanics into seven small force-read files — scratchpad · effort · explore · viability · curate · review · build — means each step's unskippable checklist re-enters context at the moment it's needed, every time, regardless of how long the conversation has run.

The goal frame

Scratchpad's deliverable is the goal frame — and the stage will not open to Discover without it (a mandatory gate, the same standing as the coverage gate; the goal gate is the front bracket of the work, coverage is the back). It is four things, written to the doc's Goal section:

A single primary goal — solution-free. The outcome the work is for, with no particular solution baked in. The test: if asking "why do we want that?" yields a more fundamental answer, you're not at the goal yet. A solution-titled goal silently anchors the whole brainstorm onto one answer (the #590 failure above).
Optional ranked secondary goals. Only real ones, listed explicitly below the primary. The primary always dominates the rubric; secondaries break ties — never a flat, co-equal set, which would blur the on-target test.
Constraints — the fixed conditions any solution must respect.
Non-goals — what the work deliberately won't do. They start here; Define extends them as problems get parked.

The goal is red-teamed before you diverge

The goal frame gets the same blind check-reasoning red-team as the problem set and the pick — and it's the highest-value of the three, because a wrong or solution-shaped goal poisons Discover, Develop and Deliver (they're all scored against it). Catching a goal-vs-goal contradiction or a missing lens here costs a sentence; catching it after build costs the build.

This is the skill. Three times, work is handed to a helper deliberately walled off from the context that would bias it. Because each is independent by construction, each can genuinely disagree instead of echoing the room.

#	Pass	Helper	What it's blind to
1	Blind divergence	`ideator` agents — one per lens	your leanings + the parked solutions — it only sees the problem + one lens
2	Blind convergence	`chooser` agent	authorship + spike-effort, stripped out — it ranks on fit-to-problem alone
3	Blind red-team	`check-reasoning` — the dedicated `review` step	the reasoning behind the goal/pool/set/pick — an independence gate, so it can't rubber-stamp

A helper that can see what you already prefer tends to agree with it. By withholding that, each blind pass adds genuinely independent signal rather than an echo — which is what makes "recommendations, not decisions" actually trustworthy: the recommendation comes from somewhere that couldn't just mirror you.

Review runs after every diverge and converge — not just at the end

The red-team is its own review fine step, and it fires five times: after the goal, the problem pool (Discover), the problem set (Define), the solution pool (Develop) and the pick (Deliver). Position picks the lens and the tool: after a diverge it's a completeness pass (a gap-finder — "what did we miss?"); after a converge (and the goal) it's a soundness pass (the blind check-reasoning reviewer). You pick how hard to red-team each time — Claude recommends a depth from the deck and waits for your pick; it never assumes one. Each review writes a review-*-verdict.json the gate checks for. This is the direct cure for "nothing prompts me before or after a step, so I just rubber-stamp."

Nothing a blind helper says is binding

Every verdict is yours to confirm or flip on the review surface before you type the gate token. The helpers keep the options honest; you still decide.

Diverge mechanics — Discover & Develop

The divergent discipline is simple to state and easy to break: widen well, stay unranked, let the gates narrow.

Stay unranked, unscored, unordered. Save "I recommend" / "most important" / "in scope" for the gates, where convergence belongs. If Claude (or you) feels the pull to narrow, it self-catches — names it ("converging early — re-widening") and keeps listing.
Park stray solutions. A solution voiced during a problem stage is captured verbatim to a Parked section and the talk returns to the problem. Crucially, the Parked list is kept out of the context handed to the ideation helpers — that exclusion is exactly what preserves their independent breadth.
The effort prompt opens every diverge. Before any divergence Claude posts the EFFORT prompt: pick the lenses (the breadth dial) and the external reach (how much prior art to pull). It recommends a default and waits — the diverge can't start until you've picked.
Lenses are the breadth dial, drawn from a position-specific deck. A lens is the angle a helper attacks from. Discover draws from the problem-lens deck (Stakeholders · Failure & edge cases · Second-order effects · Status-quo cost · Constraints that bite · No limits); Develop from the solution-lens deck (Quick win · Bold · From scratch · Borrow · Invert · No limits). You multi-select; each lens picked is one more independent helper, so more lenses = more breadth at more cost. The four decks (two generative, two adversarial) live in the skill's decks.md. Default/recommended for Develop: the two poles Quick win + Bold.
The blind fan-out. Claude runs ideate --pool-only, which dispatches 1 context-informed + one isolated ideator per chosen lens in parallel and returns one flat, unordered, unattributed pool. --pool-only because convergence here belongs to this stage's rubric and gates, not ideate's own chooser. Claude writes its own candidate list first, then merges the pool — so its breadth and the helpers' stay independent.
Optional outside input. Pull in prior art at the depth the moment needs — a quick [WebSearch], a scanner fan-out, or a full deep-research report — merged with an (external) tag so its provenance stays clear to the chooser.
Stop rule. The gate can open once at least 3 distinct options exist, with a soft cap of 5 detailed + 5 sketches to keep cognitive load down.

Viability triage — Develop only

After the solution pool is complete and before the pick, the viability step triages feasibility. It exists to cure a specific hedge: agents that come back with "prove this works" on everything, pushing all the proof-work onto you. The rule is the opposite — the agent must commit a call per solution, not punt.

For each solution Claude applies the viability lens (decks.md): does it hit unproven tech · a dependency we don't control · unknown scale/performance · opaque integration · an untested assumption · an unspiked hard case? If none, it's confident.
Every solution lands a call — confident or uncertain: <reason> — written to viability-state.json beside the doc (the Develop gate won't advance without it). Never a blanket "prove this works."
Only the genuinely-uncertain ones earn a build-spike (your approval first). "Confident, no spike" is a valid, expected call — this forces the triage, not proof of everything.

Why this is a gate, not a vibe

The call is the heart of the step: it makes the agent decide, with a reason, instead of hedging. viability-state.json is the proof the triage happened — and it's one of the artifacts the Develop→Deliver gate checks for, so the pick can't proceed on un-triaged solutions.

Converge mechanics — Define & Deliver

Narrowing is done by a method, not freelanced — the same rubric a human reads and a blind helper scores against.

The rubric

Options are scored on five criteria in priority order, with two hard clauses that hold firm:

Goal-fit — does it serve the primary goal named at Scratchpad? The top filter: an option that doesn't move the goal is off-target however neat. Secondary goals break ties only.
Fit-to-problem — does it actually solve the named problem(s) it's tagged to?
Risk — blast radius and reversibility if it goes wrong.
Maintenance — ongoing cost to keep it working.
Simplicity — cognitive load to build and to re-read later.

The two hard clauses

Simplicity breaks ties only — a simpler option wins a genuine tie, but never justifies under-specifying the work.
Effort already spent scores nothing — a built spike earns no edge over an unbuilt idea. This is what the blind chooser's authorship/effort stripping makes real.

The token gates

A gate opens only on an exact, case-sensitive UPPERCASE token you type — NEXT to advance a stage, BUILD to exit. On the token, the matching frontmatter gates.* flag flips in brainstorm.md.

The case rule is deliberate: it's the only thing that stops a casual "what's next?" in prose from tripping the gate, and the enforcement hook matches case-sensitively (grep -w, no -i). A near-miss (next, build it) is not the token — if you clearly mean to advance but didn't type it exactly, Claude asks you to confirm with the exact token rather than inferring. Determinism is the feature.

The artifact pre-flight

Before flipping a gate, every artifact the leaving stage owes must already exist beside the doc — the gate refuses to advance until they do. Each stage declares its required files in gate.conf (REQUIRE_<stage>), and the list is broader than just the exported review JSON: it covers the HTML export and the per-step review-*-verdict.json and the viability-state.json triage, so no fine step is skippable.

Leaving	Must exist first
Scratchpad	`review-goal-verdict.json`
Discover	`review-problems-pool-verdict.json`
Define	`review-problems-state.json`, `review-problems-set-verdict.json`
Develop	`review-solutions-pool-verdict.json`, `viability-state.json`
Deliver	`review-decisions-state.json`, `review-pick-verdict.json`

Claude writes the review page, but only your Export JSON produces the -state.json file — so its presence is proof the review actually happened. A missing file → the stage does not advance.

The match is case-insensitive by design — a dormant bug taught us why

An earlier version listed the required stages capitalised (Define,Develop,Deliver) and matched them case-sensitively, while real docs write lowercase stage: define. The match never hit, so the gate silently allowed every real brainstorm through — the likely root cause of the long-standing "the agent keeps skipping the HTML review." It hid because the test fixtures also used capitalised stages, exercising a path real data never takes. The fix lowercases the stage before matching and the tests now use real-shape lowercase fixtures. Lesson, now a saved learning: "wired" ≠ "fires" — a gate is only proven by a live trigger, and a fixture that doesn't match real data shape can hide the exact bug it's meant to catch.

The review surface

Every diverge→converge hand-off is reviewed on a standalone HTML page — every time, regardless of how few items there are. The HTML+JSON surface is the easy path: it lowers cognitive load and lets you comment on specific items, which chat can't. "Only a handful of options, chat is fine" is explicitly wrong — going to chat (or to build) without the page is a skip, not a shortcut.

Two frozen, JSON-driven templates live in the skill's templates/:

review-verdict.html — per-item 👍 keep / 🤷 unsure / 👎 drop + comment, for triaging a list (the problem and solution reviews).
review-picker.html — per-item single-choice radio (recommended / alternatives / other) with a Problem + Solution context header, for choosing one option (the technical pick).

Each surface is built at its stage, not retroactively, and pre-filled with the recommended verdicts so you confirm or flip rather than starting cold. You review on phone or desktop → Export JSON → it saves beside the doc and the reactions fold back into brainstorm.md.

The coverage gate

Two completeness checks bracket the work so nothing slips through (and the goal gate is the third, at the very front):

Every problem named — Discover insists on mapping every problem before any solutioning, so the work isn't scoped around a half-seen problem.
Every problem answered — before Deliver hands off, Claude builds the problem → solution map and proves every in-scope problem has at least one adopted solution. Mandatory, every time.

An uncovered gap is never shipped silently

A problem whose only solution was rejected (or left on 'maybe') is an uncovered gap. It triggers either a targeted re-diverge — a mini second-diamond scoped to just that gap — or an explicit maintainer won't-fix. This catches the failure where rejecting a solution quietly orphans the problem it was the only answer to.

The record is the state

Every brainstorm is one self-contained folder. The doc's frontmatter is the single source of truth for which stage you're in, so any session can resume exactly where it stopped — and the review pages + JSON sit beside it as the durable record.

docs/brainstorms/<date>-<slug>/
├── brainstorm.md                       # journey doc + frontmatter — the state
├── review-goal-verdict.json            # Scratchpad · goal red-team
├── review-problems-pool-verdict.json   # Discover · pool completeness review
├── review-problems.html                # Define · verdict surface
├── review-problems-state.json          # …your exported reactions
├── review-problems-set-verdict.json    # Define · set soundness review
├── review-solutions.html               # Develop · verdict surface
├── review-solutions-pool-verdict.json  # Develop · pool completeness review
├── viability-state.json                # Develop · per-solution feasibility calls
├── review-decisions.html               # Deliver · picker surface
├── review-decisions-state.json
├── review-pick-verdict.json            # Deliver · pick soundness review
└── spikes/                             # throwaway feasibility builds

.brainstorm-ledger/<slug>.marks         # hook-only marks (lens picks) — OUTSIDE the folder, unforgeable

The frontmatter that drives it:

yaml

---
brainstorm: <slug>
session: <$CLAUDE_CODE_SESSION_ID>   # owning session — scopes the gate hook to this agent
stage: scratchpad   # scratchpad | discover | define | develop | deliver | done | trashed
goal:               # the one-line primary goal (the full frame lives in the doc's Goal section)
gates:              # each flips true only on your UPPERCASE token
  scratchpad_to_discover: false
  discover_to_define: false
  define_to_develop: false
  develop_to_deliver: false
  deliver_to_build: false
updated: 2026-06-09T16:35Z
---

On re-entry Claude reads stage first, then checks whether new info invalidates it (a Reopen is always your call). The session: stamp scopes the gate hook to this agent, so it never constrains a parallel code agent or a second brainstorm running elsewhere.

stage: is the coarse position (the diamond) for resume + the gate. The fine-step record lives in the artifacts beside the doc and the marks in the ledger (lenses_<phase>, the review-*-verdict.json files, viability-state.json) — that's the deterministic layer the gate and the eval.sh grader read to tell which fine step actually happened.

Build-spikes (Develop only, optional)

Sometimes you can only tell whether an option works by building a rough version. A spike is allowed only when feasibility is genuinely uncertain and you approve — scoped first, run in an isolated worktree under spikes/, and reported as feasibility findings only.

A green spike is the highest-bias moment in the skill

A spike proves exactly one narrow thing: the slice you tested is buildable. It does not prove the approach solves the problem, is the right pick, or handles the hard cases. A passing spike is a cue to probe harder, not to switch off criticism — it's fed into convergence effort-stripped, like any other data point, and disposed of on close so it earns no edge.

Hand-off

On the BUILD token, Claude runs find-learnings first (so prior gotchas inform the build), then routes by size:

Many decisions → to-issues — creates the typed issue tree.
A single change → the matching build-* skill — carrying the spec doc + the exported review-*-state.json files as the brief.

Parked problems become Idea-typed GitHub issues, sub-issue-linked from their parent, so a good idea survives without cluttering the active backlog. A durable, architecturally significant decision may be promoted to an ADR — always surface-then-confirm before writing it.

Enforcement — behavioural first, hook second

The skill is fully functional from the prose + frontmatter alone — the discipline, gates, and coverage check all hold behaviourally with no hook. A generic gate.sh hook is the enforcement layer that makes them mechanical, and it has been verified to fire in this runtime (a real PreToolUse block test, not just a config read — the "wired ≠ fires" lesson). It reads the skill's own gate.conf:

ini

DOC_GLOB=docs/brainstorms/*/brainstorm.md
EXIT_FIELD=deliver_to_build
ALLOW_PREFIX=docs/brainstorms/
ADVANCE_TOKEN=NEXT
EXIT_TOKEN=BUILD
BYPASS_VAR=GATE_BYPASS

# Per-stage artifacts that must exist before the stage's token flips it.
# Matched CASE-INSENSITIVELY against the lowercase `stage:` value.
REQUIRE_scratchpad=review-goal-verdict.json
REQUIRE_discover=review-problems-pool-verdict.json
REQUIRE_define=review-problems-state.json,review-problems-set-verdict.json
REQUIRE_develop=review-solutions-pool-verdict.json,viability-state.json
REQUIRE_deliver=review-decisions-state.json,review-pick-verdict.json

# Fan-out gate: the divergence dispatch is held until the lens pick is recorded
# in the hook-only ledger (outside ALLOW_PREFIX, so the agent can't forge it).
LEDGER_DIR=.brainstorm-ledger
FANOUT_SUBAGENT=ideator
PHASE_discover=problems
PHASE_develop=solutions

With the hook active it does three things:

Holds code-writes outside docs/brainstorms/ and any build-* dispatch until gates.deliver_to_build: true.
Refuses to flip a stage gate on the NEXT/BUILD token until every REQUIRE_<stage> artifact exists beside the doc — the per-step pre-flight above. The match is case-insensitive (against the lowercase stage:) so it actually fires on real docs.
Blocks the divergence fan-out — denies dispatching an ideator subagent until the maintainer's lens pick for that phase is recorded as lenses_<phase> in the hook-only ledger. The ledger lives in .brainstorm-ledger/ — deliberately outside ALLOW_PREFIX, so the agent's own Write/Edit there is already denied; only the UserPromptSubmit hook writes it. That's what makes the lens mark unforgeable: the agent can't fake having asked you.

Issue-creation is deliberately not held (so a bug can be filed mid-flow), and GATE_BYPASS=1 disables the whole hook for unrelated work. Where the hook genuinely can't run, enforcement degrades gracefully to the behavioural version — still strong, because the per-step re-read + PHASE-TOKEN discipline holds without it.

Grading a run — `eval.sh`

A run's process adherence is checkable without any agent or token cost: eval.sh reads the same gate.conf the hook does (one source of truth) and grades a docs/brainstorms/<slug>/ folder — did every stage actually reached produce its required artifacts, its goal, and its lens marks? It's the free floor under quality: it proves the shape held (no step skipped), while whether the goal was good or the review thorough still needs human/LLM judgement on top. gate.test.sh covers the hook itself — 19 cases including a casing-bug regression that fails if the case-insensitive match ever breaks again.

Under the hood — what it calls

Brainstorm orchestrates rather than doing it all itself, handing specialised work to other parts of the harness:

Name	Kind	Where it's used
`ideate`	skill	Fans out the divergence pool (one helper per chosen lens), `--pool-only`.
`ideator`	agent	The blind idea generator — one isolated instance per lens.
`chooser`	agent	Ranks candidates blind (authorship + effort stripped) at the converge gates.
`check-reasoning`	skill	Independent red-team in the `review` step — the goal, both pools, the set, and the pick (soundness passes).
`scanner` · `deep-research`	agent · skill	Optional outside prior art — fan-out shortlist, or full cited report.
`vet`	skill	Nudge when an option's worth or placement is in doubt.
`find-learnings`	skill	Runs before build so prior gotchas inform it.
`to-issues` · `build-*`	skill	The hand-off — many decisions vs a single change.

When to reach for it — and when not

Use brainstorm for a half-formed idea, a "wouldn't it be nice if…", or a vague feature you want shaped before planning. It is deliberately not the right tool when:

The task is already specified — just build it.
You have a plan and want it hardened — that's grill-me.
You're unsure the idea is worth doing at all — that's vet. (They pair: brainstorm → unsure it's worth it? → vet.)
You want to ideate over the whole harness — that's find-harness-improvements; brainstorm shapes one named idea end to end.

Source of truth: .claude/skills/brainstorm/ — SKILL.md (flow + always-on rules) · stages/*.md (the seven force-read fine-step files) · decks.md (the four lens decks + the viability lens) · playbook.md (cross-cutting mechanics) · rubric.md (the yardstick) · gate.conf (the hook's config) · eval.sh / gate.test.sh (the graders).

The brainstorm skill ​

Why it exists ​

Goal → Problems → Solutions ​

The double diamond ​

Run it from the files — re-read every step ​

The goal frame ​

The heart: three blind passes ​

Diverge mechanics — Discover & Develop ​

Viability triage — Develop only ​

Converge mechanics — Define & Deliver ​

The rubric ​

The token gates ​

The artifact pre-flight ​

The review surface ​

The coverage gate ​

The record is the state ​

Build-spikes (Develop only, optional) ​

Hand-off ​

Enforcement — behavioural first, hook second ​

Grading a run — eval.sh ​

Under the hood — what it calls ​

When to reach for it — and when not ​