← All releases
v0.2 2026-04-17

Public launch

Benchmark evaluations, ground truth labeling, and detailed metrics — Arbitr goes live.

Highlights

Public release of Arbitr, available to early access users.

Benchmark Evaluations

  • Multi-document benchmark runs across multiple models
  • Per-field accuracy scoring, aggregated per document and per model
  • Token estimation and cost tracking per run

Ground Truth Labeling

  • Visual labeling tool with bounding boxes
  • AI-assisted auto-labeling

OCR Mini-Bench

  • Added Claude Opus 4.7, GPT-5.4, GPT-5.4 mini, and GPT-5.4 nano to the benchmark and public leaderboard