qa-tests

qa-tests

qareadhands-off

Use this when: you want missing test cases found before a PR

Problem it solves — Tests pass and still miss whole branches. This audits a change against the ZOMBIES and CORRECT lenses — happy and sad paths — and closes the gaps that surviving mutants reveal.

Used in workflows: QA PR & Learn · Health Check · Build the sprint

QA tests (coverage holes)

Run this before opening any PR. Goal: find gaps in logic, boundaries and failure states that the tests miss — then close them (add tests, fix bugs). Not just line coverage.

Workflow

Measure the gaps. This is the skill — run mutation as the ground truth; surviving mutants ARE the missed branches/boundaries: php artisan config:clear; vendor/bin/pest --mutate --parallel (via the PowerShell tool). Tally survivors by file:line (strip ANSI; Windows paths use \). Coverage --min=80 shows untested lines.
Account for EVERY survivor — leave none un-examined. A surviving mutant is unfinished work, not a number to clear: don't stop the moment the threshold goes green and walk away from the rest. Resolve each, one of two ways:
- Default — kill it. Add the test/assertion that pins that branch or boundary. This is almost always the answer; a survivor you "can't be bothered" to test is a real gap, not an equivalent.
- Last resort — ignore it, with a reason. Only when it's a genuine equivalent (behaviour identical either way — a defensive (string)/(int) cast, a glob()===false guard, RemoveEarlyReturn on an empty-collection guard where implode already yields '') and only after actually evaluating it, not for convenience. Either simplify the code so the mutant can't exist, or // @pest-mutate-ignore it (own line above the statement; doesn't work on if/foreach openers) with a one-line why. An un-reasoned ignore is rubber-stamping — the bar is high.
- flaky survivors (pest --parallel mis-attributes coverage; some flip run-to-run — see memory reference_mutation_dirty_unreliable) — don't chase; build margin instead.
The ≥90 / ≥80 thresholds are the floor (margin for genuine equivalents), not the target — the target is zero un-examined survivors. The bar is deliberately high (90), so the margin for legit @pest-mutate-ignore equivalents is thin: every ignore must carry a real why. Don't push to a global 100 to force this — that just incentivises lazy ignores; the discipline gets the rigour without the brittle gate.
Audit changed units against the lenses below — mutation can't flag absent error handling, so reason about failure paths too.
Add tests; fix bugs. When a mutation reveals a real bug, fix the code AND add the covering test. Never lower the 80/90 thresholds (memory feedback_threshold_policy).
Gate. composer ci:check green before the PR.

If mutation won't run (env won't build, or --parallel too flaky to trust — memory reference_mutation_dirty_unreliable): don't skip the review. Reason through the lenses by hand against the changed units, lean on --coverage for the line view, and say the mutation signal was unavailable.

Core paths

Happy path — expected use.
Sad path — invalid input, missing dependency, deliberate failure, API error (->throw).

ZOMBIES

Zero — empty/null (empty collection, json() returning null, empty body).
One — single item.
Many — collections, ordering.
Boundaries — extremes, off-by-one, the <= vs < edge.
Interfaces — external deps mocked (Http::fake, FakeTodoistClient, FakeAnthropicClient) and their error responses.
Exceptions — error handling, fallbacks (DTO fromLlm/fallback), thrown-and-caught.
Simple — the trivial scenario still holds.

CORRECT boundaries

Conformance — data shape/format (malformed LLM JSON, missing keys).
Ordering — sequence matters (job chains, sort order, positions).
Range — min/max limits.
Reference — external state the code assumes (a record exists, a token is valid).
Existence — null/absent/empty (missing property arm, deleted task).
Cardinality — exact counts (0/1/n, no duplicates, ledger prevents re-runs). Assert the count, not just existence: a created row/element/event check pairs assertDatabaseHas/assertSee/->has('x') with the exact count (assertDatabaseCount(…, 1), ->has('tasks', 1), toHaveCount(n)) — "contains the one I made" silently passes amid three I didn't.
Time — freeze the clock per-test in arrange (Carbon::setTestNow, memory feedback_deterministic_dates_in_tests); async/queue ordering, timeouts.

Fixture hygiene

Distinct IDs per object. When a test sets up more than one object (a task and a person and a comment), give each a different id — don't let them all default to 1. A shared id lets an id-keyed lookup/assertion match the wrong object type and hide a bug; distinct ids make a wrong-object match fail loudly. (One object referenced in two places — a plan row + its matching fake — should share its id.) See .claude/rules/testing.md + docs/learnings/2026-06-07-distinct-ids-per-object-in-tests.md.

Mutation can't catch a test that kills mutants while asserting nothing real — one that pins mock invocations instead of behaviour passes the gate yet tests the implementation, not the outcome. So check it by eye:

Assert outcome, not the mock. The primary assertion is the real effect (DB row, dispatched job, returned DTO, Todoist call result), not "the mock was called with X". Verifying interactions is fine as well as, never instead of.
Mock setup shouldn't dwarf the test. If the Http::fake/mock() arrange is most of the body, the unit is over-coupled to its collaborators — a brittleness smell.
No test-only methods on prod classes. A method that exists only so a test can reach inside is a seam in the wrong place.
>3 mocks in one unit = a design smell (too many collaborators), not just a long test.

Aligns with the outcome-assertion bar in .claude/rules/testing.md; these are complements to the mutation tally, catching what survivors can't.

Orchestration (opt-in)

When the gap-filling is large and disjoint — independent test files across many non-overlapping units — it may fan out across cheap-model workers per docs/agents/orchestration.md, then re-gate (mutation) in-thread. A handful of gaps, or gaps in shared files, stay in-thread.

Output

Report gaps by lens, what was added, bugs fixed, and anything left with the reason (e.g. declarative config not worth scaffolding). It's fine to find nothing — say so.

qa-tests ​

QA tests (coverage holes) ​

Workflow ​

Core paths ​

ZOMBIES ​

CORRECT boundaries ​

Fixture hygiene ​

Mock discipline — the mutation blind spot ​

Orchestration (opt-in) ​

Output ​