Usage
This page covers the day-to-day workflow: how to invoke an audit, how to read the outputs, and how to act on the remediation list.
Invoking an audit
Section titled “Invoking an audit”Through Claude (or a compatible runner with browser MCP tools):
Run cognitive-load audit on
<target>
The target can be:
- A live URL (
https://app.example.com/dashboard) - A screenshot (drag in)
- A specific component (paste JSX/Vue/HTML)
- A described app flow (prose)
- Your own surface notes from observation
At minimum one input is required. Multiple inputs of different types may be combined for richer coverage. Path 1 (live navigation) is preferred for interaction-dependent sections like State Shift and Configuration Cost; Path 2 (screenshots only) is acceptable but weaker. Documentation fallback is acceptable when access is blocked, and findings default to Inferred — see Reference / Evidence states.
Optional context
Section titled “Optional context”You can pass extra context to sharpen the audit:
| Field | What it does |
|---|---|
user_type | professional, casual, first-time, returning — informs which workflows to weight |
task | what the user is trying to do on this surface |
dense_state | description of the worst-case data state tested (full inbox, dense dashboard, etc.) |
low_bandwidth_state | was the audit run while the auditor was constrained? how? |
ai_features_present | list of AI-mediated features in scope |
If dense_state is empty, the skill warns that Section 7 (Evidence) cannot pass — every audit needs to test against a realistic dense state, not the marketing screenshot.
Reading the findings file
Section titled “Reading the findings file”Every finding has the same shape:
## Finding CL-NN — short title
Severity: Critical | High | Medium | LowSection: <section_name>Surface: <where in the product>Load displaced to: <enum value(s)>Evidence state: Observed | Inferred | Open questionSection 5 taxonomy: compressed | delayed | hidden | removed (Section 5 findings only)
Issue: <what the interface does>Why it matters: <the cognitive cost>Evidence: <state, surface, dataset, screenshot ref>Fix: <preserves power, control, source access, discoverability>The Load displaced to: field is the heart of the audit. Findings with vague displacement targets are weaker findings. See Reference for the full enum.
Reading the scorecard
Section titled “Reading the scorecard”The scorecard is JSON, conforming to shared/schemas/scorecard.base.schema.json. It contains:
summary.overall_status—audited_clean/warn/failsummary.{critical,high,medium,low}_count— finding distributionsections.<name>.status— per-section pass / warn / failfindings[]— full finding objectsopen_questions[]— findings without sufficient evidence, with a resolution pathpositive_observations[]— what the target does well (don’t regress these)hard_failure_patterns_validated— which of the four audit-spanning failure shapes the run exercised
A section fails if it has one Critical OR three Highs. Any section fail produces an overall fail.
Reading the remediation list
Section titled “Reading the remediation list”Findings ordered by severity × leverage. P1 is “fix this first.” Each entry names:
- The finding it fixes (by ID)
- The severity and Section 5 taxonomy (if applicable)
- The shape of the fix — concrete, not a research project
- Why this priority
For dogfood runs (where you own the target), the remediation list is a punch list. For pressure tests (where you don’t), it’s a model of what the fix would look like.
When to re-run an audit
Section titled “When to re-run an audit”After substantial UI changes, after shipping a remediation, after the rubric advances to a new version. Each run goes into its own evidence/<run-id>/ directory; past evidence is frozen — it documents what the rubric caught at that point in time. Don’t rewrite history when the rubric advances.
When NOT to expect findings
Section titled “When NOT to expect findings”Healthy products run clean. Dogfood Run 1 (research-os handbook) produced 1 High + 4 Medium + 3 Low — a Warn overall, no section Fail. That’s the audit working: it found real things to fix without inventing drama. If an audit produces zero findings on a real product, double-check that Section 7 (Evidence) was actually exercised against a dense state and not just the marketing screenshot.