Appearance
Security
Security isn't a sixth pillar — it's a gate folded into Guardrails, a lens inside the per-change review, and a dedicated discovery sweep. This page is the why behind those decisions and the ethos that shapes them; the canonical, exhaustive reference is the threat model in the repo (docs/security/threat-model.md).
The posture that re-weights everything
Tempo is a single-user personal-automation app — one human, one seeded account, no registration route. That single fact removes whole classes of threat: no multi-tenant isolation, no cross-account leakage, no abuse-at-scale, no privilege hierarchy. What's left matters because of what the one account holds.
The crown jewel — long-lived, encrypted OAuth tokens for the user's entire Todoist + Google account. Compromise isn't "leak a task"; it's full read/write over the whole connected Google account (Gmail, Drive, Calendar) and Todoist. Everything else — the
ANTHROPIC_API_KEY, the session, the integrity of the task store — ranks below it.
How we decide severity — evidence before the label
A severity guess made without knowing what an attacker actually controls is just guesswork — the documented cause of both inflated and missed severity. So the threat model fixes the denominator (the trust boundaries, and the attacker-controlled inputs at each), and every security review writes three facts before it reaches for a label:
- Reachability — can an attacker actually reach this path?
- Attacker-control — what do they genuinely control there?
- Blast radius — the worst realistic outcome.
Only then the label: critical (a direct, attacker-controlled path to an asset), major (needs chaining or a specific condition), minor (defence-in-depth, no clear exploit today). Naming the consequence first stops the label from anchoring the analysis.
Security is a gate — split by decidability
Security blocks like any other gate, but it's split by what a machine can decide:
- Deterministic → blocks mechanically (like PHPStan). The CI security gate runs
composer audit(dependency CVE scan),gitleaks(secret scan), and a static lint (env()outside config, raw{!! !!}, debug helpers left in). A red check stops the merge. - Reasoned → human-gated. A critical/major finding from an LLM review is not a mechanical blocker — it's surfaced for a "ship, or fix first?" decision, because that severity call needs judgement, not a rule.
"Done with known security issues" is not a mergeable state — a real defect is fixed or captured as a tracked issue, never merged as a silent caveat.
The invariants we never cross
A handful of rules carry most of the protection:
$hiddenon everyencryptedattribute. The single most-guarded invariant — it stops a decrypted token ever riding into an Inertia prop, a JSON response, a log line, or a push notification. A new model or endpoint that returns a credential-bearing model without$hiddenre-opens the crown-jewel boundary.- The system prompt is not a security boundary. Authorization is enforced in code — tiered tool gating plus server-side checks — never by a prompt instruction like "don't call this destructively."
- Fence untrusted content. Every prompt that interpolates external data — task text, webhook payloads, Gmail/Calendar content — wraps it in the
@untrustedguard, so any instruction embedded in that content is spotlighted as data, not a command. (This is prompt-injection defence — OWASP LLM01.) - Always draft, never auto-send. External side-effects (mail, webhooks, mutating a third-party system) default to a draft / dry-run that a human approves before the irreversible action fires.
- The server check is the real boundary. UI affordances are cosmetic; every state-changing action authorizes server-side.
The trust boundaries
The threat model walks eight boundaries with STRIDE. In brief:
| Boundary | Dominant threat | Primary mitigation |
|---|---|---|
| Todoist webhook (public POST) | Spoofed / tampered payload | HMAC verify (fail-closed) + DTO hydration + idempotent ledger |
| Stored credentials | Token disclosure | encrypted cast + $hidden + masked-only UI |
| Anthropic I/O | Prompt injection; cost DoS | untrusted-content fencing; MaxTokens/MaxSteps |
| MCP tools | Over-powered LLM write | stdio-only today + tiered ask/gate |
v-html render | Stored XSS | DOMPurify sanitiser |
| App auth / session | Authorization bypass | auth/verified + server-side checks |
| Google integration | Token leak (full-account blast radius) + malicious ingested email/invite | Socialite + encrypted refresh + @untrusted fencing + read-only seam |
| Dependencies / secrets | CVE / leaked secret | composer audit + gitleaks + static lint |
The AI/MCP surface (Anthropic I/O, MCP tools, v-html, Google) maps cleanly onto the OWASP LLM Top 10 — the threat model carries the crosswalk, so a review confirms coverage instead of re-deriving it each time.
Noise control — the won't-flag corpus
Just as important as what we flag is what we don't. The threat model keeps a won't-flag corpus — classes that simply don't apply here (test/seeder code paths, multi-tenant isolation, role-escalation, abuse-at-scale) — that a review must consult before flagging anything. Relitigating these on every scan is the single biggest cause of abandoned security reviews. And each accepted residual (no webhook rate-limit yet, the key not in CI) is a tracked deferral scoped to go-live — not an unreviewed gap.
Where it lives
- Rule —
.claude/rules/security.md: the tactical checklist ($hidden,validated()overall(), Form Requests, noenv()at call sites, credentials in the DB never.env). - Reference —
docs/security/threat-model.md(the canonical denominator) +review-lens.md(the evidence-before-severity procedure). - Skills & agents — find-vulns (the discovery sweep), qa-code's security lens (per-diff conformance), and the blind-reviewer agent (independent verification — PASS / FAIL / CANNOT_VERIFY).
- The gate — the deterministic CI checks, bypassable only under an audited
security-overridelabel.