← All releases

v0.2 2026-04-17

Public launch

Benchmark evaluations, ground truth labeling, and detailed metrics — Arbitr goes live.

Highlights

Public release of Arbitr, available to early access users.

Benchmark Evaluations

Multi-document benchmark runs across multiple models
Per-field accuracy scoring, aggregated per document and per model
Token estimation and cost tracking per run

Ground Truth Labeling

Visual labeling tool with bounding boxes
AI-assisted auto-labeling

OCR Mini-Bench

Added Claude Opus 4.7, GPT-5.4, GPT-5.4 mini, and GPT-5.4 nano to the benchmark and public leaderboard