Skip to content

Intelligence Layer

The intelligence layer turns dogfood evidence into reusable lessons, patterns, recommendations, and doctrine that future projects can inherit.

This page documents the finding-review state machine (candidate → reviewed → accepted → invalidated). It is one of the four distinct status vocabularies in testing-os. For the record-classification vocabulary (accepted / rejected / portfolio buckets) see Architecture and Operating Guide. For the wave-finding classification (new / recurring / fixed / unverified) emitted by dogfood-swarm runs, see the State Machines reference. For the agent_run lifecycle (the fourth vocabulary — pending / dispatched / running / complete / failed / timed_out / invalid_output / ownership_violation / aborted_for_rewind), see the same reference. The four vocabularies share words but operate on different objects — read the glossary if you have not already.

record
→ finding
→ reviewed memory
→ pattern / doctrine
→ future guidance

Every step is evidence-bound, deterministic, and auditable. No LLM in the extraction or clustering loop.

The intelligence layer adds four contracts to testing-os’ original three (record, scenario, policy):

An evidence-bound lesson extracted from one or more dogfood runs.

  • Status lifecycle: candidate → reviewed → accepted → (invalidated)
  • Must reference at least one source record and one evidence item
  • Classification: issue_kind, root_cause_kind, remediation_kind, transfer_scope
  • Stored as YAML in findings/<org>/<repo>/

A repeated lesson cluster backed by 2+ accepted findings.

  • Formed by dimension-based clustering (issue_kind + root_cause_kind)
  • False recurrence detection prevents duplicate-incident inflation
  • Strength levels: emerging, strong, portfolio_stable

Actionable guidance derived from accepted patterns.

  • Kinds: starter_check, starter_scenario, policy_seed, evidence_expectation, verification_rule, review_prompt
  • Each recommendation includes a specific action with type, target, and details
  • Confidence tracks pattern strength

Hardened portfolio rules earned from repeated patterns.

  • Only promoted from strong or portfolio_stable patterns
  • org_wide scope requires 2+ supporting patterns
  • Statement reads as a directive, not a suggestion

Eight deterministic rules extract candidate findings from verified dogfood records:

RuleFires onIssue
Surface misclassificationInvalid product_surface enumsurface_misclassification
Evidence policy mismatchEvidence requirement rejectionevidence_overconstraint
Verdict downgradeProposed pass downgradedschema_mismatch / policy_mismatch
Scenario step failureStep-level failuresbuild_output_mismatch / entrypoint_truth
Blocked scenarioBlocked verdict with reasonverification_gap
Execution mode gapMixed/human missing attestationexecution_mode_mismatch
Schema rejectionNon-surface schema failuresschema_mismatch
Policy rejectionNon-evidence policy failurespolicy_mismatch

Every emitted finding includes the rule ID, rationale, and exact evidence references.

Findings move through a lawful state machine:

  • candidate — machine- or human-created, not yet reviewed
  • reviewed — looked at by a human, may need refinement
  • accepted — approved as trustworthy reusable learning
  • rejected — declined with explicit reason

Available actions: accept, reject, review, edit, merge, reopen, invalidate

All actions are logged in an append-only event log with actor, timestamp, from/to status, field diffs, and reasons.

Two or more findings describing the same lesson can be merged into one canonical finding. The merge preserves all evidence, source record IDs, and lineage. Source findings are marked superseded.

Accepted findings can be invalidated when source truth changes. Invalidated findings are excluded from advice queries but retained for historical reference.

Synthesis writes patterns, recommendations, and doctrine with status: candidate. The advice surface (queryPatterns / queryRecommendations / queryDoctrine) returns only accepted artifacts — so until a candidate is reviewed and accepted, nothing the intelligence layer derives reaches a future project. The artifact review verbs are what close that loop.

Terminal window
# Promote a derived pattern into the advise surface — accept = the loop closes
node packages/findings/cli.js patterns accept <pattern_id> --actor <name> --reason "..."
# Same for recommendations and doctrine
node packages/findings/cli.js recommendations accept <recommendation_id> --actor <name>
node packages/findings/cli.js doctrine accept <doctrine_id> --actor <name>
# Reject (reason required) or, for patterns, invalidate an accepted one
node packages/findings/cli.js patterns reject <pattern_id> --actor <name> --reason "not a real recurrence"
node packages/findings/cli.js patterns invalidate <pattern_id> --actor <name> --reason "source changed"
# What is awaiting review?
node packages/findings/cli.js patterns queue

The artifact review law reuses the finding status law (review/transitions.js). It is intentionally narrower than the finding lifecycle because the artifact schemas are narrower:

  • pattern status ∈ candidate / accepted / rejected / invalidated
  • recommendation and doctrine status ∈ candidate / accepted / rejected

Because no artifact schema permits the intermediate reviewed status, the review and reopen verbs (which target reviewed) are refused for artifacts rather than writing a schema-invalid file. invalidate is supported for patterns only — recommendations and doctrine have no invalidated status, so use reject to retire them. Every accepted/rejected/invalidated decision is written through the synthesis writers, which re-validate the artifact against its JSON Schema, and is logged in the same append-only event log (carrying artifact_id + artifact_kind).

Re-running <type> derive --write produces the same deterministic ids. A freshly-derived candidate that collides with an artifact you already promoted (accepted / rejected / invalidated) is preserved, not overwritten — the operator’s decision is load-bearing. Collisions are reported as Preserved (operator-promoted, not overwritten). This mirrors the findings dedupe (derive/dedupe.js) and lives in synthesis/dedupe-artifacts.js.

An accepted recommendation whose action is a structured add_scenario / add_check can be applied directly into a named repo policy:

Terminal window
# Preview (default) — renders the change, writes nothing
node packages/findings/cli.js recommendations apply <recommendation_id> --policy <org/repo>
# Apply the structured intent + record provenance
node packages/findings/cli.js recommendations apply <recommendation_id> --write --policy <org/repo> --actor <name>

This is honest partial automation:

  • Only an accepted recommendation is applicable; others refuse with a structured { code, message, hint } error.
  • --write applies only the structured target id — adding it to the policy’s surfaces.<surface>.required_scenarios — and records recommendation_id + details as provenance.
  • The free-text action.details is never injected as policy logic.
  • Free-text-only action types (set_policy, set_evidence, set_verification, add_review_step) and ambiguous targets (no named policy, or a recommendation spanning multiple surfaces) refuse --write with a hint to apply manually. Never a fake auto-apply.

The adoption layer answers future-project questions directly:

Terminal window
# What should a new MCP server repo inherit?
node packages/findings/cli.js advise --surface mcp-server
# What about a desktop app with mixed-mode dogfood?
node packages/findings/cli.js advise --surface desktop --execution-mode mixed
# Export all accepted learning for repo-knowledge
node packages/findings/cli.js sync-export --json

Advice bundles include:

  • Starter checks and scenarios
  • Evidence expectations
  • Likely failure classes (top 3)
  • Relevant doctrine
  • Supporting pattern and finding IDs

Results are ranked (stronger and more specific first) and capped (max 5 recommendations, 5 doctrine, 3 failure classes).

  • list — list all findings with filters
  • show <id> — show finding detail
  • validate — validate all findings against schema
  • derive --all --dry-run — derive candidates from records
  • explain <id> — show derivation provenance
  • accept <id> --actor <name> --reason "..." — promote to accepted
  • reject <id> --actor <name> --reason "..." — reject with reason
  • edit <id> --actor <name> --set field=value — edit fields
  • merge <id1> <id2> --into <canonical> --actor <name> --reason "..." — merge findings
  • invalidate <id> --actor <name> --reason "..." — invalidate accepted finding
  • reopen <id> --actor <name> — reopen rejected/accepted finding
  • history <id> — show review audit trail
  • queue — show pending review work
  • patterns derive [--write] — derive patterns from accepted findings
  • recommendations derive [--write] — derive from accepted patterns
  • doctrine derive [--write] — derive from strong patterns
  • patterns accept <id> --actor <name> [--reason "..."] — promote a candidate pattern into the advise surface
  • patterns reject <id> --actor <name> --reason "..." — reject a pattern (reason required)
  • patterns invalidate <id> --actor <name> --reason "..." — invalidate an accepted pattern (patterns only)
  • patterns queue — patterns awaiting review
  • recommendations accept|reject <id> --actor <name> — review a recommendation
  • recommendations apply <id> [--dry-run | --write] [--policy <org/repo>] [--actor <name>] — apply an accepted recommendation into a policy
  • recommendations queue
  • doctrine accept|reject <id> --actor <name> — review a doctrine
  • doctrine queue
  • advise --surface <surface> [--execution-mode <mode>] — get advice bundle
  • sync-export [--json] — export for repo-knowledge
SystemRole
testing-osSource of truth — owns all learning artifacts
repo-knowledgeConsumer — syncs accepted artifacts via sync-export
role-osConsumer — pulls advice into bootstrap/review contexts
shipcheckEnforcement — uses dogfood status, not intelligence layer directly

246 finding tests at v1.1.7 (covering contract spine, derivation, review, synthesis, and adoption modules). The precise per-module breakdown drifts with each release and is intentionally not kept in lockstep with this slow-moving handbook page. See docs/m5-validation-2026-04-29.md for the authoritative current matrix.