Error Code Reference
testing-os’ CLIs surface structured errors at the top-level seam via renderTopLevelError (packages/dogfood-swarm/lib/error-render.js). Every typed error carries:
code— stable identifier (e.g.ISOLATION_FAILED)message— operator-facing prosehint— explicit next step (or a per-code derived hint when the error class did not set one)- optional
cause(Caused by: …),runId,waveId,agentRunId,findingsAttempted
CLI output shape:
ERROR [<CODE>]: <message> Next: <hint> Caused by: <inner error message> Wave: <waveId>Untyped errors keep the original ERROR: <message> single-line shape. A leading ERROR [<CODE>]: is the signal that one of the codes below is in play.
Severity tiers — fix order at a glance
Section titled “Severity tiers — fix order at a glance”| Severity | Visual cue | Meaning | Operator response |
|---|---|---|---|
| CRITICAL | :::danger callout (red ⊘) | Persistent state corrupted or contract broken; a record / index is wrong, not just absent | Stop ingesting, repair the underlying state, then resume |
| HIGH | :::caution callout (orange ⚠) | Operator action required before the system can make progress; one run lost | Diagnose using the hint, fix the upstream cause, re-dispatch |
| MEDIUM | :::note callout (blue ℹ) | Informational — a race or transient issue handled gracefully | Inspect the persisted state with the suggested CLI, then continue |
| LOW | :::tip callout (green ✓) | Caller bug surfaced as a state-machine reject; system state is consistent | Fix the caller; no recovery needed on the testing-os side |
Severity is encoded by the Starlight callout type at the top of each code below — color is paired with the icon and the bolded Severity: title, so a color-blind operator gets the same fix-order signal from the icon + word as a sighted operator gets from the hue. WCAG AA contrast ratios for each callout variant are asserted by scripts/check-severity-contrast.test.mjs.
RECORD_SCHEMA_INVALID
Section titled “RECORD_SCHEMA_INVALID”- Class:
RecordValidationError(packages/ingest/validate-record.js) - Trigger: A persisted record fails AJV validation against
dogfood-record.schema.json. Surfaced fromvalidateRecord()during ingest. - Message shape:
persisted record failed schema validation: <path> <ajv message>; <path> <ajv message>; … - Hint:
inspect the failing record against packages/schemas/src/json/dogfood-record.schema.json and fix the invalid fields before re-ingesting - Operator action:
- Open
packages/schemas/src/json/dogfood-record.schema.jsonand locate each path from the message. - The error object also carries
errors[]with{ path, keyword, message }for programmatic inspection. - Fix the upstream emitter (the source repo’s submission builder), not the schema. Schema is a contract.
- Re-dispatch the source workflow to produce a clean record.
- Open
DUPLICATE_RUN_ID
Section titled “DUPLICATE_RUN_ID”- Class:
DuplicateRunIdError(packages/ingest/persist.js) - Trigger:
writeRecordlost a TOCTOU race for the same canonical record path. Two concurrent writers tried to persist the samerun_id; the first won. - Message shape:
duplicate run_id: <run_id> — another writer won the race for <path> - Hint:
a run with this id already exists — use a fresh run id or \swarm runs` to inspect the existing one` - Carries:
runId,path - Operator action:
- In ingest: this is informational — the first writer succeeded, the system is consistent. Re-running the source workflow with a fresh
run_idproduces a new record. - In swarm:
swarm runslists existing runs by id. Either re-dispatch with a fresh id or accept the existing record.
- In ingest: this is informational — the first writer succeeded, the system is consistent. Re-running the source workflow with a fresh
ISOLATION_FAILED
Section titled “ISOLATION_FAILED”- Class:
IsolationError(packages/dogfood-swarm/lib/errors.js) - Trigger:
--isolatewas requested on aswarm dispatchbutcreateWorktree()failed. Pre-fix, dispatch silently fell back to running the agent in the main repo; isolation is now a contract — only valid responses are “isolated” or “loud failure”. - Message shape: the underlying worktree error wrapped with the explicit isolation context. Inspect
e.cause.messagefor the git-level reason. - Hint:
run \git worktree list` to inspect existing worktrees, or re-dispatch without —isolate` - Operator action:
git worktree listfrom the repo root to see what’s already attached.git worktree pruneto clean stale references;git worktree remove <path>to clear specific entries.- Re-dispatch with
--isolate, or drop--isolateif isolation is not required for this run (accepting the shared-workspace risk).
COLLECT_UPSERT_FAILED
Section titled “COLLECT_UPSERT_FAILED”- Class:
CollectUpsertError(packages/dogfood-swarm/lib/errors.js) - Trigger:
swarm collect’s findings upsert transaction threw. Common underlying causes: SQLitebusy_timeoutexhaustion, fingerprint UNIQUE collision, prepared-statement crash. The artifact rows + file_claims + agent state transitions had already committed; the wave-status UPDATE had not. - Message shape: structured wrapper with
e.cause.messagecarrying the SQLite-level reason. - Hint:
wave <id> has artifacts persisted but findings missing — inspect with \swarm status`, then re-run `swarm collect` once the underlying SQLite issue is resolved (busy_timeout or fingerprint UNIQUE collision)` - Carries:
waveId,findingsAttempted,cause - Operator action:
swarm statusto confirm the wave is in a half-written state (artifacts present, findings missing).- Diagnose the underlying SQLite issue from
Caused by:.busy_timeoutusually means another process holds the DB; check for stuckswarmprocesses. UNIQUE collision usually means the fingerprint algorithm matched an existing row — checkswarms/control-plane.dbfor the colliding finding. - Re-run
swarm collectfor the same wave once resolved. The outer wrapper is idempotent at the upsert level.
STATE_MACHINE_<KIND> — BLOCKED, TERMINAL, INVALID
Section titled “STATE_MACHINE_<KIND> — BLOCKED, TERMINAL, INVALID”- Class:
StateMachineRejectionError(packages/dogfood-swarm/lib/errors.js) - Trigger:
transitionAgent()rejected a state-machine transition. Thekindfield discriminates why:STATE_MACHINE_BLOCKED— the transition is legal in the abstract but blocked by a guard (e.g. dependencies not met, override required). Operator’s problem.STATE_MACHINE_TERMINAL— the agent is in a terminal state (complete,rejected, etc.) — no transitions allowed. Caller bug — something tried to advance an already-finished agent.STATE_MACHINE_INVALID— the transition is missing from theTRANSITIONStable. Legitimate disallowed transition (e.g.idle → completeskippingrunning).
- Message shape:
Illegal transition <from> → <to>: <reason>with explicit kind ine.code. - Hint:
e.hintis set per-kind by the throwing site (e.g. “useswarm overrideto force …” for BLOCKED, “this agent is already complete; check why the caller tried to re-advance it” for TERMINAL). - Carries:
kind,from,to,agentRunId,allowedTransitions[](legaltoset from the currentfrom). - Operator action:
- BLOCKED: look at the
Next:hint — usually points at an override flag or a missing prerequisite. - TERMINAL: the agent is done; the bug is upstream. Inspect the caller for a re-advance loop.
- INVALID: check
allowedTransitions[]for what the state machine will accept from thisfrom. Either reroute the call or, if the transition should be legal, file a finding to add the edge toTRANSITIONS.
- BLOCKED: look at the
Cross-references
Section titled “Cross-references”- Hard Gate B (Errors): structured shape (code/message/hint), exit codes for CLI, no raw stacks. See README threat model.
- The state machine these errors come out of: State Machines.
- Where rejected records land when ingest throws
RECORD_SCHEMA_INVALIDorDUPLICATE_RUN_ID:records/_rejected/(Beginner’s Guide → Investigating a failure).