Skip to content

Usage

This page covers the day-to-day workflow: how to invoke an audit, how to read the outputs, and how to act on the remediation list.

Through Claude (or a compatible runner with browser MCP tools):

Run cognitive-load audit on <target>

The target can be:

  • A live URL (https://app.example.com/dashboard)
  • A screenshot (drag in)
  • A specific component (paste JSX/Vue/HTML)
  • A described app flow (prose)
  • Your own surface notes from observation

At minimum one input is required. Multiple inputs of different types may be combined for richer coverage. Path 1 (live navigation) is preferred for interaction-dependent sections like State Shift and Configuration Cost; Path 2 (screenshots only) is acceptable but weaker. Documentation fallback is acceptable when access is blocked, and findings default to Inferred — see Reference / Evidence states.

You can pass extra context to sharpen the audit:

FieldWhat it does
user_typeprofessional, casual, first-time, returning — informs which workflows to weight
taskwhat the user is trying to do on this surface
dense_statedescription of the worst-case data state tested (full inbox, dense dashboard, etc.)
low_bandwidth_statewas the audit run while the auditor was constrained? how?
ai_features_presentlist of AI-mediated features in scope

If dense_state is empty, the skill warns that Section 7 (Evidence) cannot pass — every audit needs to test against a realistic dense state, not the marketing screenshot.

Every finding has the same shape:

## Finding CL-NN — short title
Severity: Critical | High | Medium | Low
Section: <section_name>
Surface: <where in the product>
Load displaced to: <enum value(s)>
Evidence state: Observed | Inferred | Open question
Section 5 taxonomy: compressed | delayed | hidden | removed (Section 5 findings only)
Issue: <what the interface does>
Why it matters: <the cognitive cost>
Evidence: <state, surface, dataset, screenshot ref>
Fix: <preserves power, control, source access, discoverability>

The Load displaced to: field is the heart of the audit. Findings with vague displacement targets are weaker findings. See Reference for the full enum.

The scorecard is JSON, conforming to shared/schemas/scorecard.base.schema.json. It contains:

  • summary.overall_statusaudited_clean / warn / fail
  • summary.{critical,high,medium,low}_count — finding distribution
  • sections.<name>.status — per-section pass / warn / fail
  • findings[] — full finding objects
  • open_questions[] — findings without sufficient evidence, with a resolution path
  • positive_observations[] — what the target does well (don’t regress these)
  • hard_failure_patterns_validated — which of the four audit-spanning failure shapes the run exercised

A section fails if it has one Critical OR three Highs. Any section fail produces an overall fail.

Findings ordered by severity × leverage. P1 is “fix this first.” Each entry names:

  • The finding it fixes (by ID)
  • The severity and Section 5 taxonomy (if applicable)
  • The shape of the fix — concrete, not a research project
  • Why this priority

For dogfood runs (where you own the target), the remediation list is a punch list. For pressure tests (where you don’t), it’s a model of what the fix would look like.

After substantial UI changes, after shipping a remediation, after the rubric advances to a new version. Each run goes into its own evidence/<run-id>/ directory; past evidence is frozen — it documents what the rubric caught at that point in time. Don’t rewrite history when the rubric advances.

Healthy products run clean. Dogfood Run 1 (research-os handbook) produced 1 High + 4 Medium + 3 Low — a Warn overall, no section Fail. That’s the audit working: it found real things to fix without inventing drama. If an audit produces zero findings on a real product, double-check that Section 7 (Evidence) was actually exercised against a dense state and not just the marketing screenshot.