fix(tests): loosen GPU floating-point tolerances in density reward tests#173
fix(tests): loosen GPU floating-point tolerances in density reward tests#173k-chrispens merged 2 commits intomainfrom
Conversation
test_loss_monotonic_with_perturbation and test_vmap_consistency were written with CPU-tight tolerances but now run on GPU where parallel reductions are non-deterministic. Allow small epsilon for monotonicity and widen vmap vs sequential tolerance to account for GPU arithmetic.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR relaxes numerical tolerances in two test assertions within Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable sequence diagrams in the walkthrough.Disable the |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/rewards/test_real_space_density_reward.py (1)
554-554: Consider whether these tolerances can be tightened.The new tolerances (
rtol=1e-2, atol=1e-3) are significantly looser than the float32 defaults (rtol=1e-5, atol=1e-8). While GPU floating-point non-determinism from parallel reductions is real, a 1% relative tolerance may mask actual correctness issues in the vmap vs sequential execution paths.If empirical testing shows failures at tighter tolerances, consider documenting the observed variance magnitude in a comment to justify the chosen values. Alternatively, intermediate tolerances like
rtol=1e-4, atol=1e-5might provide sufficient headroom while catching larger discrepancies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rewards/test_real_space_density_reward.py` at line 554, The assertion comparing vmap vs sequential uses very loose tolerances (torch.testing.assert_close(result_vmap, result_sequential, rtol=1e-2, atol=1e-3)); tighten these to stricter values (e.g., rtol=1e-4, atol=1e-5) and rerun tests, and if failures remain, add a short comment in tests/rewards/test_real_space_density_reward.py next to the assert explaining the empirically observed variance and why the chosen tolerances are necessary (reference the variables result_vmap and result_sequential and the assert_close call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/rewards/test_real_space_density_reward.py`:
- Line 554: The assertion comparing vmap vs sequential uses very loose
tolerances (torch.testing.assert_close(result_vmap, result_sequential,
rtol=1e-2, atol=1e-3)); tighten these to stricter values (e.g., rtol=1e-4,
atol=1e-5) and rerun tests, and if failures remain, add a short comment in
tests/rewards/test_real_space_density_reward.py next to the assert explaining
the empirically observed variance and why the chosen tolerances are necessary
(reference the variables result_vmap and result_sequential and the assert_close
call).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79dc2563-ce31-4aed-8c8c-33fdfa5d267e
📒 Files selected for processing (1)
tests/rewards/test_real_space_density_reward.py
rtol=1e-1, atol=5e-4 based on empirically observed abs diff ~1.3e-4 and rel diff ~6.7e-2 from CI A100 runs.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
LGTM - though we may want to also run these on CPU. Adding an issue on this: #174
Summary
test_loss_monotonic_with_perturbationandtest_vmap_consistencywere written with CPU-tight tolerances but now run on GPU (since feat(ci): add GPU test workflow with pytest gpu marker #156 added@pytest.mark.gpu)rtol=1e-2, atol=1e-3Test plan
Summary by CodeRabbit