Methodology

The prediction problem in law
is an architecture problem.

Outcome prediction in litigation, insurance, and regulatory proceedings fails not because the data is insufficient. It fails because the model design is wrong. Criterica's architecture addresses this at the structural level.

ARCHITECTURE

Jurisdiction-specific models, not generalist classifiers.

General-purpose legal AI trains on pooled case data across hundreds of jurisdictions and expects a single model to generalize across venue, bench composition, and procedural history. The prediction error from that pooling is not recoverable. Jurisdictions differ structurally, not just in volume. A model trained on California commercial litigation will systematically mispredict outcomes in the Fifth Circuit or Northern Ireland. Criterica trains one model per jurisdiction per case type, narrow and deep, tuned on the actual caselaw of that venue. The fleet covers 26,563 production models across federal circuits, state appellate courts, and specialist tribunals. No model is asked to generalize beyond the cases it was trained on, and that constraint is architecturally enforced, not advisory.

TRAINING DATA

Real court records. No synthetic augmentation.

Synthetic training data in legal AI is a compounding error: a model trained on AI-generated case summaries learns the statistical patterns of another model, not the patterns of actual judicial behavior. Every Criterica model trains exclusively on real filed records from federal and state dockets, international tribunals, and regulatory enforcement actions, drawing on 3.52B+ records across 89 jurisdictions. The training corpus is deduplicated, outcome-labeled, and temporally partitioned before any model sees it. Training sets are partitioned by filing date so a model is never trained on cases it should not yet be able to see. Our proprietary outcomes corpus provides the primary US backbone, supplemented by state court filings, licensed industry datasets, and international tribunal data from England, Wales, Australia, and Canada. Zero synthetic rows appear anywhere in the production fleet.

OUTPUT FORMAT

A reliable probability, not a sentiment label.

Most legal AI returns text: a summary, a risk flag, a qualitative category. Criterica outputs a structured score set for each matter: a risk score (0–100), risk band (Low / Moderate / Elevated / High / Critical), recovery probability (0.0–1.0), damages range (low / expected / high), a recommendation (FUND / CONDITIONAL / PASS), model confidence, and the list of models applied. Reliability is measured on cases the model never saw, partitioned at the circuit level. A 0.73 output means a 73 percent prediction is right about 73 percent of the time in real outcomes, not that it "feels likely." Outputs are audience-configurable: Capital Provider, Law Firm, and Insurer views are derived from the same model layer, structured per decision context. The output feeds a spreadsheet, a risk model, or a funding committee. It is not a narrative. That distinction is intentional.

VALIDATION

Performance thresholds enforced at promotion. No exceptions.

Models that do not clear the promotion threshold, tested against real outcomes for how reliably they predict, remain in stub status regardless of sample size or elapsed time. Circuit-level models currently in production are tested on cases they never saw and held to a consistent reliability bar across case type and jurisdiction. Models that score perfectly are flagged as suspect and held for review before promotion, because a result that looks too good almost always means the model had effectively seen the answer in advance, not that it found a real signal. Known problem patterns, including formula-based statutory outcomes and circular features, are identified during training review and either excluded or permanently flagged as non-promotable. Six models remain in experimental status pending additional data, and two remain suspended after review. The registry records every model's training date, training row count, feature vector, performance, and promotion decision, and that record is permanent.

Statistics shown reflect historical or illustrative model outputs derived from real case data. They are not predictions or guarantees of any individual outcome. Litigation results depend on facts, jurisdiction, judge, and counsel, and vary case by case. Model accuracy is subject to selection effects and changing legal dynamics.

Model counts, the 3.52B+ record corpus, and the promotion gate are documented in the methodology note.