The prediction problem in law
is an architecture problem.
Outcome prediction in litigation, insurance, and regulatory proceedings fails not because the data is insufficient. It fails because the model design is wrong. Criterica's architecture addresses this at the structural level.
Jurisdiction-specific models, not generalist classifiers.
General-purpose legal AI trains on pooled case data across hundreds of jurisdictions and expects a single model to generalize across venue, bench composition, and procedural history. The prediction error from that pooling is not recoverable. Jurisdictions differ structurally, not just in volume — a model trained on California commercial litigation will systematically mispredict outcomes in the Fifth Circuit or Northern Ireland. Criterica trains one model per jurisdiction per case type, narrow and deep, calibrated on the actual caselaw of that venue. The fleet covers 23,508 production models across federal circuits, state appellate courts, and specialist tribunals. No model is asked to generalize beyond the distribution it was trained on — that constraint is architecturally enforced, not advisory.
Real court records. No synthetic augmentation.
Synthetic training data in legal AI is a compounding error: a model trained on AI-generated case summaries learns the statistical patterns of another model, not the patterns of actual judicial behavior. Every Criterica model trains exclusively on real filed records from federal and state dockets, international tribunals, and regulatory enforcement actions — 106M+ records across 89 jurisdictions. The training corpus is deduplicated, outcome-labeled, and temporally partitioned before any model sees it. Training sets are partitioned by filing date to prevent look-ahead leakage into reported outcomes. The Federal Judicial Center civil terminations dataset provides the primary US backbone, supplemented by state court filings, EDGAR financial records, and international tribunal data from England, Wales, Australia, and Canada. Zero synthetic rows appear anywhere in the production fleet.
A calibrated probability, not a sentiment label.
Most legal AI returns text: a summary, a risk flag, a qualitative category. Criterica outputs a structured score set for each matter: a risk score (0–100), risk band (Low / Moderate / Elevated / High / Critical), recovery probability (0.0–1.0), damages range (low / expected / high), a recommendation (FUND / CONDITIONAL / PASS), model confidence, and the list of models applied. Calibration is measured against held-out test sets partitioned at the circuit level. A 0.73 output means the model's confidence distribution assigns 73% probability to that outcome in historical test data — not that it "feels likely." Outputs are audience-configurable: Capital Provider, Law Firm, and Insurer views are derived from the same model layer, structured per decision context. The output feeds a spreadsheet, a risk model, or a funding committee. It is not a narrative. That distinction is intentional.
AUC thresholds enforced at promotion. No exceptions.
Models that do not clear the promotion threshold (AUC ≥ 0.60 for binary classification, RMSE target for regression) remain in stub status regardless of sample size or elapsed time. Circuit-level models currently in production average AUC 0.61–0.71 depending on case type and jurisdiction. Models showing AUC=1.0 are flagged as suspect and held for leakage review before promotion — a rate that looks too good is almost always a data problem, not a real signal. Known leakage patterns, including formula-based statutory outcomes and circular features, are identified during training review and either excluded or permanently flagged as non-promotable. Six models remain in experimental status pending additional data, and two remain suspended due to identified leakage. The registry records every model's training date, training row count, feature vector, AUC, and promotion decision, and that record is permanent.
Criterica Intelligence Model Architecture
Executive summary of the Criterica Intelligence data architecture, model training methodology, and output validation framework.