Skip to content

Architecture

This page covers how audits are structured, how they advance through their lifecycle, and how the monorepo holds multiple audits without collapsing them into one mega-rubric.

Each audit ships four things:

  1. Rubricaudits/<name>/RUBRIC.md. Markdown. Sections, audit questions, severity rules, the audit’s law. Canonical.
  2. Skillaudits/<name>/skill/SKILL.md. Markdown with YAML frontmatter. Invocation contract, inputs, outputs, procedure. The runner walks the rubric.
  3. Schemaaudits/<name>/schemas/finding.extensions.json. JSON Schema. Audit-specific extension fields on top of the base finding contract. Audits add fields; they do not redefine or remove base fields.
  4. Evidenceaudits/<name>/evidence/<run-id>/. At least one completed pressure test or dogfood run, in three files: findings, scorecard, remediation list.

No evidence, no official audit. A Draft audit may sit in the repo but it’s not listed in the top-level README’s audit table until it has at least one evidence run.

StateMeaningRequired to enter
DraftRubric exists; not yet pressure-tested against a real targetRUBRIC.md authored
Pressure-testedAt least one pressure test completed; findings produced; rubric may have been revisedOne PT in evidence/<pt-id>/
FrozenRubric version cut; future revisions only via new calibrating evidenceA PT where the rubric did NOT change; version frozen in CHANGELOG.md
DogfoodedAudit applied to at least one surface the maintainer owns, without rubric churnOne dogfood run
RevisedA PT or dogfood produced calibration evidence; rubric advanced to next versionNew CHANGELOG entry citing the evidence

States combine: an audit can be “Frozen v0.2 + Dogfooded once.”

State is declared in each audit’s README.md as a header line:

state: Frozen v0.2 + Dogfooded once
audit_prefix: CL
catches: load displacement (memory, search, trust, ...)

Mechanically inspectable via grep "^state: " audits/*/README.md.

A pressure test (PT) answers: does the rubric survive contact with a real target, and does it produce honest findings without inventing drama? Pressure tests can revise the rubric.

A dogfood run answers: does the audit produce actionable findings on a surface the maintainer owns, without requiring rubric churn? Dogfood runs should NOT change the rubric. If one does, it’s a signal to elevate it to a pressure test and run a real PT.

Both produce the same three output files. They differ in intent and in what they’re allowed to change.

A pressure test designed to find a specific failure shape (e.g. “find a Removed-power case”) is at structural risk of over-fitting. The discipline:

  1. Start each finding with the most conservative evidence state (Inferred or Open question).
  2. Re-check findings against the rubric’s actual criteria after the run completes.
  3. If a finding gets downgraded on second pass, that’s the audit working — record the reclassification in the auditor notes.

Findings that survive second-pass reclassification are the real output. The reclassification trail itself is calibration evidence.

This rule was earned by PT2 (Outlook). The first draft overclaimed two findings as “Removed power”; honest reclassification moved both to “Hidden” because alternate access paths existed. The reclassified file is preserved at evidence/pt2-outlook-doc-fallback/ as a record.

Cross-audit norms live in shared/. Each file is normative — audits may extend, not override.

FileWhat
shared/audit-lifecycle.mdThe five-state machine and transitions
shared/evidence-states.mdObserved / Inferred / Open question
shared/severity-model.mdCritical / High / Medium / Low + section-Fail threshold
shared/finding-format.mdThe finding contract + load-displaced-to enum
shared/pressure-test-protocol.mdPT vs dogfood, setup, procedure, exit criteria, discipline rule
shared/schemas/finding.base.schema.jsonJSON Schema for findings
shared/schemas/scorecard.base.schema.jsonJSON Schema for scorecards

Audit-specific severity preconditions live in the audit’s RUBRIC.md, not in shared/.

Audits add fields via audits/<name>/schemas/finding.extensions.json. The cognitive-load audit’s extension:

{
"properties": {
"section_5_taxonomy": {
"type": "string",
"enum": ["compressed", "delayed", "hidden", "removed", "n/a"]
}
}
}

Extensions add fields. They cannot redefine severity, section, load_displaced_to, or any other base field. The base schema sets additionalProperties: true so extensions don’t break validation.

A new audit that needs a custom field declares it in schemas/finding.extensions.json. CI validates findings against both base + extensions.

Why “interface audits” instead of one mega-audit

Section titled “Why “interface audits” instead of one mega-audit”

Each audit catches a different shape of failure with a different evidence model:

  • Cognitive Load — load displacement under bandwidth constraint (Frozen v0.2 + Dogfooded once)
  • Low-Vision (Pressure-tested v0.1.0; PT0 on MDN ARIA docs, 2026-06-02) — visual access under real density, not just contrast
  • Screen Reader Task (Pressure-tested v0.1.0; PT0 on react.dev/learn, 2026-06-02) — task continuity and completion, not just ARIA validity
  • Color Dependence (Pressure-tested v0.1.0; PT0 on microsoft/vscode GitHub Actions, 2026-06-02) — meaning conveyed only by color, including the contrast-pass / hue-fail boundary
  • Motor Access (Pressure-tested v0.1.0; PT0 on GOV.UK Design System, 2026-06-02) — interaction cost and error recovery (the exclusion vs cost boundary)
  • Motion Sensitivity (future) — animation, parallax, vestibular triggers
  • AI Trust Surface (future) — source traceability and uncertainty display

Folding them all into one rubric would collapse distinctions that matter. Each audit answers “what burden does this catch that generic scanners miss?” with a specific answer. They share the severity model, finding format, lifecycle, and schema base — but each owns its own sections.

Audits are added one at a time, with grounding, then pressure-tested when a real target justifies the work. Five audits now share the repo. The 4 new audits authored and pressure-tested 2026-06-02 advance next to Frozen state once a second PT runs without rubric change (per shared/audit-lifecycle.md). PT0 evidence for each lives at audits/<slug>/evidence/pt0-<target>/.