Wrap use of CUDA mutex in where reductions #1217
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the use of CUDA mutexes in a
where
reduction by wrapping the use ofwhere
and itsselector
in the same mutex lock region. Previously the two were separate, which can lead to some hard to track down variations in results as multiple CUDA threads could potentially try to alter the same data at the same time. The wrapping occurs in thecompiler.py
make_append
function, which is where the existing special handling ofwhere
reductions takes place.Whilst doing this I have chosen to move all use of a CUDA mutex from
reductions.py
tocompiler.py
so that it is all in one place. This benefits us in that it reduces the future proliferation ofreduction._append_cuda
functions which might have otherwise needed versions both with and without mutex locking.I have also rewritten the handling of return values from
reduction._append_cuda
functions to explicitly check if the function call has changed the data stored in theagg
or not, as this wasn't always correct before when handlingNaN
s.This work is a necessary precursor to completing the CUDA support for
where(max_n)
and similar (issues #1182 and #1207) which will follow shortly.