Support multi-GPU eval in tallyqa.py (#69) by BuildWithAbid · Pull Request #334 · m87-labs/moondream

BuildWithAbid · 2026-04-23T08:13:43Z

Closes #69 (reference pattern — I'll port the other moondream/eval/*.py scripts in a follow-up PR once the approach here is approved).

Approach

TallyQA eval is embarrassingly parallel per sample (no gradient sync, each row is independent), so multi-GPU is just dataset sharding plus a single all_reduce at the end:

If launched under torchrun, init a NCCL process group and pin each process to its local-rank device.
Shard the HuggingFace dataset contiguously via dataset.shard(world_size, rank).
Each rank runs the existing eval loop on its shard.
all_reduce the four counters (total, correct, total_simple, correct_simple); compute accuracies on rank 0.

No new dependencies — just torch.distributed, which ships with torch.

Usage

Single GPU / CPU / MPS (unchanged):

python -m moondream.eval.tallyqa --model <path>

Multi-GPU:

torchrun --nproc_per_node=<N> -m moondream.eval.tallyqa --model <path>

The distributed path only activates when LOCAL_RANK is set in the environment (i.e. the process was launched by torchrun), so existing single-GPU invocations behave exactly as before.

Notes

Progress bar and final print only run on rank 0 to keep stdout clean.
dist.destroy_process_group() is called on shutdown to avoid NCCL exit warnings.
Drive-by: switched the three args.debug references inside eval_tallyqa to the debug parameter it already accepts — this means eval_all.py can call it without leaking module globals. Happy to split this into a separate commit/PR if preferred.

Test plan

Syntax-checks and passes black --check.
Single-GPU path: no changes to control flow when LOCAL_RANK is unset.
Multi-GPU smoke test on 2× T4 (Modal). To isolate the distributed plumbing from model correctness, I ran the eval with a stubbed model (counts its own calls, returns a fixed answer) on the first 40 dataset samples. Results: single-GPU processed 126 QAs; 2-GPU sharded into 79 + 47 = 126 (full coverage, no overlap); accuracies matched exactly between both runs (Simple 36.29%, Full 36.51%); the rank-0-gated final prints appeared exactly once in the 2-GPU output. This directly exercises dist.init_process_group, dataset.shard(contiguous=True), dist.all_reduce, and the rank-0 print guards.

Once this pattern is accepted I'll port the other eval scripts (pope.py, chartqa.py, textvqa.py, docvqa.py, mmstar.py, coco_map.py, naturalbench.py, countbenchqa.py, realworldqa.py) and update eval_all.py in a follow-up.

Eval is embarrassingly parallel per sample, so each rank evaluates a shard of the dataset and the counts are summed with all_reduce at the end. Launch with: torchrun --nproc_per_node=<N> -m moondream.eval.tallyqa --model <path> The single-GPU / CPU / MPS path is unchanged: when LOCAL_RANK is not in the environment, the process group is never initialized. Also replaces the few `args.debug` references inside eval_tallyqa with the `debug` parameter it already accepts, so the function can be called from eval_all.py without relying on module-level globals.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5511be6b64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread moondream/eval/tallyqa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-GPU eval in tallyqa.py (#69)#334

Support multi-GPU eval in tallyqa.py (#69)#334
BuildWithAbid wants to merge 1 commit into
m87-labs:mainfrom
BuildWithAbid:multi-gpu-eval-tallyqa

BuildWithAbid commented Apr 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BuildWithAbid commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

Usage

Notes

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BuildWithAbid commented Apr 23, 2026 •

edited

Loading