SS study-swarm
Cited research, verified

study-swarm no model grades its own homework.

A protocol for grounding substantial design decisions in cited research — then verifying every citation with a different model family, reasoning-stripped, before any of it informs the design.

Dispatch

one research agent per question — cited findings only

Verify

roleos verify-citations <dispatch>

Halt

fabricated → drop · verifier down → escalate

Why it works

Documented failure modes, each closed by evidence — not intuition.

Family-different verification

A different model family checks every citation, reasoning-stripped. Same-family judges self-prefer (Panickssery 2024); the external verifier carries the gains (Huang 2023, Kambhampati 2024).

Retrieval-oracle existence floor

Existence is confirmed by resolving the arXiv/DOI — never model memory. 18–55% of LLM citations are fabricated (Walters & Wilder 2023); links resolve but the content often does not support the claim (Onweller 2026).

Halt, don’t hope

Fabricated → dropped. Misattributed → corrected once. Verifier or oracle unavailable → halt and escalate. An unverified citation never reaches the design.

Diversity beats count

≥3 decorrelated lenses — a retrieval oracle plus ≥2 different families. LLM errors correlate, so lens diversity is the load-bearing variable (Rajan 2025, Kim 2025).

The protocol

Five steps

1. Identify 3–5 load-bearing questions
2. Dispatch one research agent per question
3. Synthesize into a "Research grounding" section
4. Verify externally (different family, reasoning-stripped)
5. Connect each choice back to a finding

Verify the citations

# different family, reasoning-stripped,
# retrieval-oracle existence floor
roleos verify-citations 
#  → prism verify --type citations