Three-Gate Rubric

Scoring Philosophy

The rubric does not score truth. It scores the public evidence of seriousness, novelty signal, and possible impact.

Each of the 17 criteria is rated on a 0–4 scale:

Score	Meaning
0	Absent — no discernible evidence
1	Weak — minimal or ambiguous evidence
2	Partial — some evidence, but gaps remain
3	Strong — clear, verifiable evidence
4	Unusually strong — evidence exceeds what is typical for the genre

The correct interpretive unit is the profile, not the sum. A work that scores 4 on novelty but 1 on reproducibility scaffolding tells a different story from one that scores 2 across the board, even if the arithmetic totals coincide. Do not average these into a single composite number or treat the total as a truth score.

Gate 1 — Research-Form Legitimacy

Gate 1 asks whether the work looks like a serious research artifact that has earned external scrutiny. It evaluates the structural and methodological surface of the project, not the correctness of its claims.

ID	Criterion	What It Evaluates
G1-1	Claim typing	Whether internal claims, bridge claims, empirical mappings, and worldview/commitment claims are clearly distinguished
G1-2	Method visibility	Whether the methods are stated openly enough for an outsider to inspect them
G1-3	Hinge visibility	Whether the load-bearing hinges or decisive claims are identifiable
G1-4	Reproducibility scaffolding	Whether meaningful public routes into verification exist (formalisation, claim maps, tours, repository)
G1-5	Falsification readiness	Whether the work names fair failure modes or break points
G1-6	Scope discipline	Whether the work avoids silently inflating from an internal result to a stronger public claim
G1-7	Review-worthiness	Whether the public evidence suggests the work has earned serious human scrutiny

Gate 2 — Novelty and Relevance

Gate 2 asks whether the claims, if they remained supported after scrutiny, would constitute a genuine contribution. It evaluates novelty signals without adjudicating priority or correctness.

ID	Criterion	What It Evaluates
G2-8	Novelty signal	Whether the claims appear potentially non-trivial and not obviously standard
G2-9	Domain relevance	Whether, if true, the claims would matter to active research domains
G2-10	Prior-art awareness	Whether the project appears aware of the need to distinguish novelty from rediscovery
G2-11	Cross-domain relevance	Whether the work connects domains in a way that would matter if valid
G2-12	Specificity of contribution	Whether the contributions are articulated specifically enough to judge novelty

Gate 3 — Impact and Salvage Value

Gate 3 asks what remains significant under three scenarios: if the core claims substantially hold, if major bridges weaken, and if the core spine fails entirely. It evaluates resilience and reusability, not probability.

ID	Criterion	What It Evaluates
G3-13	Upside magnitude	How large the contribution could be if the core claims held
G3-14	Partial-hold value	Whether significant value would remain if major bridges weakened
G3-15	Salvage value	Whether methods, tools, formalisations, taxonomies, or conceptual structures would remain useful if the core spine failed
G3-16	Reusability of artifacts	Whether reusable public artifacts exist independent of ultimate truth (repository, tours, formal maps, structured dossiers)
G3-17	Strategic importance of inspection	Whether, even under uncertainty, the work is important enough to merit serious review because the upside is large

Interpretation Profiles

The 17 scores form a profile. The following archetypes help calibrate interpretation.

Strong review-ready

Gate 1 scores mostly 3–4
Gate 2 includes at least some 3s
Gate 3 includes at least one or two 4s
Low inflation and strong scope discipline throughout

This profile indicates a work that appears methodologically serious, potentially novel, and consequential enough to warrant structured expert engagement.

High-risk / high-upside

Gate 1 strong
Gate 2 mixed
Gate 3 very high
Clear need for expert checking of the strongest claims

This profile indicates a work whose potential significance is large but whose novelty claims require careful independent validation. The review priority is high precisely because the stakes are high.

Interesting but premature

Some novelty signal in Gate 2
Weak reproducibility scaffolding or weak claim typing in Gate 1

This profile indicates a work that may contain genuine ideas but has not yet built the structural apparatus needed for disciplined external scrutiny. The recommended action is to improve the inspectability surface before seeking review.

Weak

Weak hinge visibility
Weak method visibility
Little basis for specialist attention

This profile indicates that the public materials do not currently provide enough evidence of seriousness to justify the investment of expert time.

Confidence Labels

For each gate, the dossier should also assign a confidence label to the assessment itself — not to the claims under review. The three levels are:

High — The public materials provide enough evidence to score this gate with reasonable assurance. Example: high confidence in Gate 1 because the repository, tours, and scope labels are publicly visible and inspectable.
Medium — The assessment is plausible but would benefit from domain-specific verification. Example: medium confidence in Gate 2 because prior-art comparisons require specialist knowledge the model may lack.
Low — The assessment is provisional and should not be relied upon without human expert evaluation. Example: low confidence in Gate 3 specifics because upside magnitude depends on judgments that exceed AI competence.

Confidence labels are a self-audit mechanism. They tell the reader where the dossier is most and least trustworthy, and where human expertise is most urgently needed.