fix OOB read in psf_precision_value_from causing NaN sparse-CPU log_evidence#296
Merged
fix OOB read in psf_precision_value_from causing NaN sparse-CPU log_evidence#296
Conversation
…vidence The kernel walk in psf_precision_value_from (used by the sparse-operator CPU inversion path) reads value_native[ip0_y + k0_y + kernel_shift_y, ...] without bounds checking. For mask pixels within `kernel_shift` of the noise-map array boundary, that index lands off the array; @numba.jit() does not bounds-check, so the read returns uninitialized memory. For the HST 28x28 RectangularAdaptDensity pixelization profiled in autolens_workspace_developer/jax_profiling/imaging/pixelization_sparse_cpu.py, this produced 1064 inf entries in the curvature matrix, a NaN log_det, and ultimately NaN log_evidence. log_likelihood was -191493 (vs the correct +28664) because the inf-poisoned F+H made the NNLS solver return all-zeros for s, giving a wildly wrong chi-squared. The fix adds an explicit bounds check around the kernel read. Off-array positions are skipped, which matches the function's existing semantics for masked-but-zeroed interior pixels (`if value > 0.0: ...` already filters those). After the fix, sparse and non-sparse paths agree on the HST workspace_developer model to rtol=1.18e-09. Existing tests are unaffected: the closest util test (test__psf_precision_operator_sparse_from) uses interior pixels in a 4x4 noise map so the bounds check is a no-op there. The integration tests in test_factory.py use a 3x3 no-blur PSF (only center=1) so off-diagonal kernel reads contribute zero regardless. The new test__psf_precision_operator_sparse_from__edge_pixels exercises the fixed path with corner pixels and a non-trivial 3x3 PSF, validating against a pure-numpy reference. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
psf_precision_value_from(used by the sparse-operator CPU inversion path) walks the PSF kernel overvalue_native(the noise map) without bounds-checking the read.@numba.jit()does not bounds-check, so for mask pixels withinkernel_shiftof the noise-map array boundary, the read returns uninitialized memory — astronomical/non-finite contributions that poison the entirepsf_precision_operatorand downstream curvature matrix.Symptom
On the HST 28×28
RectangularAdaptDensitymodel profiled inautolens_workspace_developer/jax_profiling/imaging/pixelization_sparse_cpu.py:infentries ininversion.curvature_matrix.log_det_curvature_reg_matrix_term=NaN→log_evidence=NaN.s→chi_squared=462622(vs correct22306) →log_likelihood=-191493(vs correct+28664).Why existing tests didn't catch it
test__psf_precision_operator_sparse_from(test_inversion_imaging_util.py:87) uses interior-only pixels[[1,1],[1,2],[2,1],[2,2]]in a 4×4 noise map — the bounds check is a no-op there.test_factory.pyuse a 3×3 no-blur PSF whose only non-zero entry is the center — off-diagonal kernel reads contribute zero regardless of OOB.kernel_shiftof the noise-map array boundary. This is the standard HST configuration but not the test fixtures.Fix
Add an explicit bounds check around the
value_native[…]read inpsf_precision_value_from. Off-array positions are skipped, matching the existingif value > 0.0filter for masked-but-zeroed interior pixels.Verification
test_autoarray/inversion/inversion/pass.test__psf_precision_operator_sparse_from__edge_pixels(corner pixels + non-trivial 3×3 PSF; reference checked against pure-numpy bounds-checked re-implementation) passes after fix; would have failed onmain.test__psf_precision_operator_sparse_fromcontinues to pass with byte-identical numbers (the bounds check is a no-op for interior pixels).autolens_workspace_developer/jax_profiling/imaging/pixelization_sparse_cpu.py: sparse and non-sparselog_evidencenow agree tortol=1.18e-09(26232.0685428vs26232.0685738); JAX expected:26232.0685738.Files Changed
autoarray/inversion/inversion/imaging_numba/inversion_imaging_numba_util.py— bounds-check the kernel read inpsf_precision_value_from.test_autoarray/inversion/inversion/imaging/test_inversion_imaging_util.py— addtest__psf_precision_operator_sparse_from__edge_pixels.Note for future maintainers
A quick read of the rest of
inversion_imaging_numba_util.pyshows no other unguarded kernel walks onvalue_native— every other function iterates on precomputed sparse-pair indices. So this single-function bounds-check is sufficient.Test Plan
pytest test_autoarray/inversion/inversion/— 44 passedtest__psf_precision_operator_sparse_from__edge_pixels— passedpixelization_sparse_cpu.py— sparse path now matches non-sparse tortol=1.18e-09🤖 Generated with Claude Code