Added DotsOCR results of olmOCR-bench #343

aman-17 · 2025-09-26T19:46:34Z

Added DotsOCR results of olmOCR-bench.

Final Summary with 95% Confidence Intervals:
dotsocr              : Average Score: 69.3% ± 1.1% (average of per-JSONL scores)
    absent  : 79.6% average pass rate over 823 tests
    baseline: 97.7% average pass rate over 1403 tests
    math    : 65.8% average pass rate over 3385 tests
    order   : 65.6% average pass rate over 1061 tests
    present : 41.3% average pass rate over 721 tests
    table   : 84.8% average pass rate over 1020 tests
    Results by JSONL file:
        arxiv_math.jsonl              : 65.2% (1909/2927 tests)
        baseline                      : 97.8% (1363/1394 tests)
        headers_footers.jsonl         : 79.5% (604/760 tests)
        long_tiny_text.jsonl          : 46.2% (204/442 tests)
        multi_column.jsonl            : 72.9% (644/884 tests)
        old_scans.jsonl               : 38.6% (203/526 tests)
        old_scans_math.jsonl          : 69.7% (319/458 tests)
        table_tests.jsonl             : 84.8% (867/1022 tests)

montvid · 2025-10-16T08:56:58Z

Their own test results are better than olmocr - published here https://github.com/rednote-hilab/dots.ocr?tab=readme-ov-file#3-olmocr-bench

dots.ocr 82.1 64.2 88.3 40.9 94.1 82.4 81.2 99.5 79.1 ± 1.0

jakep-allenai · 2025-10-16T15:48:44Z

Yeah, that's why I don't exactly want to merge this. I am not sure what else is different between this code and theirs which is causing lower scores.

jakep-allenai · 2025-10-16T15:49:17Z

Open to contributions btw :D

montvid · 2025-10-20T14:06:44Z

I am trying to use your bench suite with this manual https://github.com/allenai/olmocr/tree/main/olmocr/bench It seems it needs an update - of course I don't have sglang but I do have vllm. And how do I run dots.ocr benchmark? I got to:

python -m olmocr.bench.convert olmocr_pipeline --dir ./olmOCR-bench/bench_data
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/kvb/olmocr/olmocr/bench/convert.py", line 246, in
module = importlib.import_module(module_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1206, in _gcd_import
File "", line 1178, in _find_and_load
File "", line 1149, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/kvb/olmocr/olmocr/bench/runners/run_olmocr_pipeline.py", line 7, in
from olmocr.pipeline import (
ImportError: cannot import name 'sglang_server_host' from 'olmocr.pipeline' (/home/kvb/olmocr/olmocr/pipeline.py)

jakep-allenai · 2025-10-23T18:51:06Z

Yeah, sadly some bitrot has occurred...

In the latest release from yesterday, the python -m olmocr.bench.convert olmocr_pipeline --dir ./olmOCR-bench/bench_data should run for the olmOCR case again.

aman-17 added 2 commits September 26, 2025 18:31

Added bench eval for dotsocr

7dc6a4b

fixed lint and style

e1bc7b8

aman-17 requested a review from jakep-allenai September 26, 2025 19:46

aman-17 self-assigned this Sep 26, 2025

montvid mentioned this pull request Oct 16, 2025

olmocr benchmark numbers rednote-hilab/dots.ocr#234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added DotsOCR results of olmOCR-bench #343

Added DotsOCR results of olmOCR-bench #343

Uh oh!

aman-17 commented Sep 26, 2025

Uh oh!

montvid commented Oct 16, 2025

Uh oh!

jakep-allenai commented Oct 16, 2025

Uh oh!

jakep-allenai commented Oct 16, 2025

Uh oh!

montvid commented Oct 20, 2025 •

edited

Loading

Uh oh!

jakep-allenai commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Added DotsOCR results of olmOCR-bench #343

Are you sure you want to change the base?

Added DotsOCR results of olmOCR-bench #343

Uh oh!

Conversation

aman-17 commented Sep 26, 2025

Uh oh!

montvid commented Oct 16, 2025

Uh oh!

jakep-allenai commented Oct 16, 2025

Uh oh!

jakep-allenai commented Oct 16, 2025

Uh oh!

montvid commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakep-allenai commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

montvid commented Oct 20, 2025 •

edited

Loading