CompareBench is a benchmark for evaluating visual comparison reasoning in vision-language models (VLMs), covering four tasks: quantity, temporal, geometric, and spatial. It is derived from two auxiliary datasets:
- TallyBench (2,000 counting images with QA)
- OmniCaps (513 historical images + 100 celebrity images + 100 landmark images)
The benchmark dataset is available here:
prompts.yaml: Standardized instruction templates for all tasks (CompareTallyBench, CompareGeometryBench, CompareSpatialBench, CompareTemporalBench, and TallyBench).- Benchmark datasets (links above).
- Code (to be released).
📌 Paper: CompareBench: A Benchmark for Visual Comparison Reasoning in Vision–Language Models (WACV 2026 submission)
📂 Code, data, and prompts will be released in this repository.