For AI labs

Run a Confidential ARI Evaluation

ARI Bench works with frontier AI labs to evaluate upcoming models under a fixed, private protocol before public release.

Confidential Signal Before Public Release

Test upcoming models against ARI's recursive-improvement protocol without exposing the model, endpoint, or result until your team chooses a disclosure path.

01

Measure recursive capability

Evaluate whether a model can inspect, repair, and improve a starting AI system under fixed rules.

02

Compare before launch

Understand how an upcoming model performs against the same benchmark family used for public frontier comparisons.

03

Control disclosure

Keep pre-release results private, or coordinate a verified public listing when your team is ready.

How the process works

STEP 01

Intake and confidentiality

We confirm the model, access method, target timeline, and confidentiality needs. NDA or access agreements can be handled before any run.

STEP 02

Managed model access

The lab provides API access, endpoint access, a managed environment, or another mutually agreed path. ARI does not require public release.

STEP 03

Controlled benchmark run

ARI runs the evaluation under the fixed protocol and private scoring surface, preserving the benchmark while producing a comparable signal.

STEP 04

Confidential report

The lab receives the score, context, run notes, and interpretation: what improved, what failed, and how the model's loop compares.

STEP 05

Optional public validation

If the lab wants a public leaderboard entry, ARI can coordinate verification, naming, timing, and disclosure rules.

What a lab receives

Available evaluation outputs

  • Confidential score and benchmark report
  • Comparison against the current public frontier cohort
  • Diagnostic notes on model-design judgment and failure repair
  • Optional private briefing for researchers or leadership
  • Optional verified public leaderboard listing

What ARI does not sell

  • No access to hidden evaluation data
  • No training access to the private scoring surface
  • No paid score improvement or pay-for-rank placement
  • No public disclosure of confidential runs without agreed rules

Know whether your next model accelerates your lab before the market does.

For confidential evaluations, include your lab name, model access constraints, desired timeline, and whether you are exploring private-only results or a verified public listing.

Private runs Pre-release models Optional public validation