fix(tests): loosen GPU floating-point tolerances in density reward tests by Abdelsalam-Abbas · Pull Request #173 · diff-use/sampleworks

Abdelsalam-Abbas · 2026-03-16T18:10:37Z

Summary

test_loss_monotonic_with_perturbation and test_vmap_consistency were written with CPU-tight tolerances but now run on GPU (since feat(ci): add GPU test workflow with pytest gpu marker #156 added @pytest.mark.gpu)
GPU parallel reductions are non-deterministic in floating-point ordering, causing sub-epsilon failures
Add 1e-6 epsilon to monotonicity check, widen vmap vs sequential tolerance to rtol=1e-2, atol=1e-3

Test plan

GPU test workflow passes on this PR
Verify the two previously failing tests now pass

Summary by CodeRabbit

Tests
- Updated test assertions with numerical tolerances for floating-point comparisons to improve reliability across different environments.
- Broadened validation thresholds in consistency checks to account for floating-point precision variations.

test_loss_monotonic_with_perturbation and test_vmap_consistency were written with CPU-tight tolerances but now run on GPU where parallel reductions are non-deterministic. Allow small epsilon for monotonicity and widen vmap vs sequential tolerance to account for GPU arithmetic.

coderabbitai · 2026-03-16T18:10:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a60eb3e0-083c-443e-9bad-6565d45fec62

📥 Commits

Reviewing files that changed from the base of the PR and between 159d32c and 1cf981f.

📒 Files selected for processing (1)

tests/rewards/test_real_space_density_reward.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/rewards/test_real_space_density_reward.py

📝 Walkthrough

Walkthrough

This PR relaxes numerical tolerances in two test assertions within tests/rewards/test_real_space_density_reward.py: a monotonicity loss comparison now allows 1e-6 tolerance, and a vmap vs. sequential consistency check uses wider rtol/atol values.

Changes

Cohort / File(s)	Summary
Test Tolerance Adjustments `tests/rewards/test_real_space_density_reward.py`	Relaxed monotonicity assertion to allow a 1e-6 numerical tolerance. Broadened vmap vs. sequential consistency tolerances to `rtol=1e-1`, `atol=5e-4` with comments about FP accumulation and CI differences.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

marcuscollins

Poem

🐰 I hopped through numbers, soft and light,
Eased a test that used to bite,
A tiny tolerance, calm and small,
Lets floating errors not appall,
Hooray — the CI sleeps tonight! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: loosening GPU floating-point tolerances in density reward tests to address test failures caused by non-deterministic floating-point ordering on GPU hardware.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/gpu-test-tolerances

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

coderabbitai

🧹 Nitpick comments (1)

tests/rewards/test_real_space_density_reward.py (1)
554-554: Consider whether these tolerances can be tightened.

The new tolerances (rtol=1e-2, atol=1e-3) are significantly looser than the float32 defaults (rtol=1e-5, atol=1e-8). While GPU floating-point non-determinism from parallel reductions is real, a 1% relative tolerance may mask actual correctness issues in the vmap vs sequential execution paths.

If empirical testing shows failures at tighter tolerances, consider documenting the observed variance magnitude in a comment to justify the chosen values. Alternatively, intermediate tolerances like rtol=1e-4, atol=1e-5 might provide sufficient headroom while catching larger discrepancies.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/rewards/test_real_space_density_reward.py` at line 554, The assertion
comparing vmap vs sequential uses very loose tolerances
(torch.testing.assert_close(result_vmap, result_sequential, rtol=1e-2,
atol=1e-3)); tighten these to stricter values (e.g., rtol=1e-4, atol=1e-5) and
rerun tests, and if failures remain, add a short comment in
tests/rewards/test_real_space_density_reward.py next to the assert explaining
the empirically observed variance and why the chosen tolerances are necessary
(reference the variables result_vmap and result_sequential and the assert_close
call).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/rewards/test_real_space_density_reward.py`:
- Line 554: The assertion comparing vmap vs sequential uses very loose
tolerances (torch.testing.assert_close(result_vmap, result_sequential,
rtol=1e-2, atol=1e-3)); tighten these to stricter values (e.g., rtol=1e-4,
atol=1e-5) and rerun tests, and if failures remain, add a short comment in
tests/rewards/test_real_space_density_reward.py next to the assert explaining
the empirically observed variance and why the chosen tolerances are necessary
(reference the variables result_vmap and result_sequential and the assert_close
call).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79dc2563-ce31-4aed-8c8c-33fdfa5d267e

📥 Commits

Reviewing files that changed from the base of the PR and between e5db928 and 159d32c.

📒 Files selected for processing (1)

tests/rewards/test_real_space_density_reward.py

rtol=1e-1, atol=5e-4 based on empirically observed abs diff ~1.3e-4 and rel diff ~6.7e-2 from CI A100 runs.

Abdelsalam-Abbas · 2026-03-16T18:33:21Z

@coderabbitai review

coderabbitai · 2026-03-16T18:35:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

k-chrispens

LGTM - though we may want to also run these on CPU. Adding an issue on this: #174

Abdelsalam-Abbas had a problem deploying to gpu-testing March 16, 2026 18:10 — with GitHub Actions Error

Abdelsalam-Abbas requested review from k-chrispens and marcuscollins and removed request for marcuscollins March 16, 2026 18:10

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

fix(tests): tighten vmap tolerance with comment on observed GPU variance

1cf981f

rtol=1e-1, atol=5e-4 based on empirically observed abs diff ~1.3e-4 and rel diff ~6.7e-2 from CI A100 runs.

Abdelsalam-Abbas temporarily deployed to gpu-testing March 16, 2026 18:21 — with GitHub Actions Inactive

Abdelsalam-Abbas requested a review from marcuscollins March 16, 2026 18:23

k-chrispens approved these changes Mar 16, 2026

View reviewed changes

Abdelsalam-Abbas mentioned this pull request Mar 16, 2026

Configure tests on RewardFunction monotonicity, vmap consistency to test performance both on CPU and GPU (where we have some non-determinism) #174

Open

k-chrispens merged commit 0969184 into main Mar 16, 2026
4 checks passed

k-chrispens deleted the fix/gpu-test-tolerances branch March 16, 2026 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): loosen GPU floating-point tolerances in density reward tests#173

fix(tests): loosen GPU floating-point tolerances in density reward tests#173
k-chrispens merged 2 commits intomainfrom
fix/gpu-test-tolerances

Abdelsalam-Abbas commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Abdelsalam-Abbas commented Mar 16, 2026

Uh oh!

coderabbitai Bot commented Mar 16, 2026

Uh oh!

k-chrispens left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abdelsalam-Abbas commented Mar 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Abdelsalam-Abbas commented Mar 16, 2026

Uh oh!

coderabbitai Bot commented Mar 16, 2026

Uh oh!

k-chrispens left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Abdelsalam-Abbas commented Mar 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 16, 2026 •

edited

Loading

k-chrispens left a comment •

edited

Loading