Appearance
Gates
A gate is a checkpoint: a point past which the agent may not proceed until a condition is met — and the condition must be proven, not asserted. Where Guardrails stop the agent acting dangerously, gates stop it advancing prematurely. The two pair up: guardrails are about don't do harm; gates are about don't move on until it's earned.
Quality gateNot "done" because it works — done when the whole check suite is green. Thresholds are never lowered to pass.Token gateWon't continue until you type an exact, case-sensitive word (e.g. NEXT). A casual "go" can't trip it.Independence gateA reviewer that never saw your reasoning gives the verdict, so it can't rubber-stamp work just because it was argued well.Proof-of-work gateWon't advance until the evidence exists — every problem has a solution, the review file is on disk, no critical findings open.Phase gateResolve every decision first (PLAN), then build hands-off. Coding can't begin until the spec is approved.
The throughline: every gate opens on evidence — green checks, an exact token you typed, an independent pass, an artifact on disk — never on the agent's own say-so that it's fine.
The quality gate — green before a PR
The canonical gate is the Definition of Done: code isn't done because it works, it's done when composer ci:check is green (PHPStan level 7, coverage ≥ 80%, lint/format/types, the deterministic security checks) and the mutation gate holds. "Done with known issues" is not a mergeable state. The thresholds are never lowered to pass — a surviving mutant means add a test, not relax the gate.
You
great, open the PR
Claude
Holding —
composer ci:check is red: coverage is 78%, the gate needs ≥ 80%. One branch in WebhookController is untested. I'll add that test, get it green, then open the PR.Token gates — human consent, made deterministic
Some transitions are reserved for you, and the harness makes that consent unforgeable. brainstorm advances only on an exact, case-sensitive UPPERCASE token — NEXT to move a stage, BUILD to exit to code. The case rule is the whole point: it's the only thing that stops a casual "what's next?" in prose from tripping the gate. A near-miss (next, build it) is not the token — if you clearly mean to advance but didn't type it, the agent asks you to confirm with the exact token rather than inferring. Behaviourally this holds from the prose alone; a generic gate.sh hook makes it mechanical where it can. Determinism is the feature: a gate you could nudge open isn't a gate.
You
yep that's the one, let's go
Claude
That reads like a go — but the gate needs the exact token, on purpose, so a casual "go" can't trip it. Type NEXT to advance to Develop.
You
NEXT
Claude
Gate open. ━━ DEVELOP · diverging ━━ — generating solution paths now.
The independence gate — a pass that can't rubber-stamp
A reviewer who saw why you did something tends to agree with it. So the harness's sharpest gate withholds that: check-reasoning and the blind-reviewer agent return a verdict without ever seeing the author's reasoning — they judge the artifact against the standards, blind. They can't bless work just because it was confidently argued. This is the defence against rubber-stamping, and it's why "recommendations, not decisions" is trustworthy: the recommendation comes from somewhere that couldn't simply mirror you.
Claude
Before I commit this, I'm dispatching blind-reviewer — it sees the diff and the standards, but not my reasoning, so it can't just agree with me.
Claude
Verdict came back FAIL on boundary B1 — the webhook HMAC check isn't constant-time (
WebhookController.php:42). I'm not arguing it out of the finding; fixing it before this ships.Proof-of-work gates — the condition must exist on disk
Some gates check that a step actually happened rather than was claimed:
- The coverage gate — before brainstorm hands off, it proves every in-scope problem has an adopted solution. A problem whose only solution was rejected is surfaced as an uncovered gap, never shipped silently.
- The artifact pre-flight — a stage won't advance until its review export exists beside the doc. Only your export produces the file — so its presence is proof the review was really done, not skipped.
- The severity gate — a
qa-finding carriescritical / major / minor; critical and major gate a PR, minor can ride or defer.
You
looks done, hand it off to build
Claude
Coverage gate stops me: the problem "duplicate webhooks" has no adopted solution — its only candidate was rejected, so it's an uncovered gap. I won't ship a half-covered spec. Re-diverge on just that gap, or mark it won't-fix?
Phase gates — front-load, then hands-off
The biggest gate is structural: resolve the decisions before you execute. build-sprint splits into an interactive PLAN sweep (front-load every ambiguity, red-teamed by the independence gate) and only then an autonomous EXECUTE loop. brainstorm's five diamond stages are the same idea — each gate opens on your token, so building can't begin until the spec is approved. This is the spine clause verification before delivery, applied to thinking before building.
You
skip the planning, just build it
Claude
We're still in PLAN — 3 decisions are open (auth method · retry policy · idempotency key). Let me pin those first; once the spec's approved, EXECUTE runs hands-off and won't keep stopping to ask. Want to settle them now?
The pattern: a gate is a deliberate stop with a checkable condition. Proven, not asserted — by a green pipeline, an exact token, a blind pass, or an artifact on disk. It's how the harness lets an agent run autonomously between gates without letting it skip the checkpoints that matter.