Measure recursive capability
Evaluate whether a model can inspect, repair, and improve a starting AI system under fixed rules.
For AI labs
ARI Bench works with frontier AI labs to evaluate upcoming models under a fixed, private protocol before public release.
Test upcoming models against ARI's recursive-improvement protocol without exposing the model, endpoint, or result until your team chooses a disclosure path.
Evaluate whether a model can inspect, repair, and improve a starting AI system under fixed rules.
Understand how an upcoming model performs against the same benchmark family used for public frontier comparisons.
Keep pre-release results private, or coordinate a verified public listing when your team is ready.
We confirm the model, access method, target timeline, and confidentiality needs. NDA or access agreements can be handled before any run.
The lab provides API access, endpoint access, a managed environment, or another mutually agreed path. ARI does not require public release.
ARI runs the evaluation under the fixed protocol and private scoring surface, preserving the benchmark while producing a comparable signal.
The lab receives the score, context, run notes, and interpretation: what improved, what failed, and how the model's loop compares.
If the lab wants a public leaderboard entry, ARI can coordinate verification, naming, timing, and disclosure rules.
For confidential evaluations, include your lab name, model access constraints, desired timeline, and whether you are exploring private-only results or a verified public listing.