Public launch
Benchmark evaluations, ground truth labeling, and detailed metrics — Arbitr goes live.
Highlights
Public release of Arbitr, available to early access users.
Benchmark Evaluations
- Multi-document benchmark runs across multiple models
- Per-field accuracy scoring, aggregated per document and per model
- Token estimation and cost tracking per run
Ground Truth Labeling
- Visual labeling tool with bounding boxes
- AI-assisted auto-labeling
OCR Mini-Bench
- Added Claude Opus 4.7, GPT-5.4, GPT-5.4 mini, and GPT-5.4 nano to the benchmark and public leaderboard