You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noticed whilst reviewing #167 - I enabled NRT with the environment variable NUMBA_CUDA_ENABLE_NRT=1 and I got fed up of waiting for the test suite after 10 minutes (it usually completes in under 2 minutes on my machine).
For the test suite to complete in approximately the same amount of time as with NRT disabled.
Environment details (please complete the following information):
Numba-cuda main.
Additional context
There are two problems:
I think we are linking in NRT more than we need to be - I haven't yet looked deeper, but I see it getting added for ufunc tests, which I think it should not be required for.
The NRT is added as a path, so it has to be read from disk by the driver every time. We should switch to reading it in on first use into a CUSource object and linking that, so we only read it once and cut disk I/O out of the compilation path.
The text was updated successfully, but these errors were encountered:
I think we are linking in NRT more than we need to be
With NRT enabled we are now incref / decrefing every array that gets passed in. We need to add in the refcount pruning pass so that we're not degrading performance of all kernels that use arrays with unnecessary refcount operations.
Remove unnecessary NRT incref/decref in the given LLVM function. It uses highlevel type info to determine if the function does not need NRT. Such a function does not:
return array object(s);
take arguments that need refcounting except array;
call function(s) that return refcounted object.
In effect, the function will not capture or create references that extend
the lifetime of any refcounted objects beyond the lifetime of the function.
The rewrite is performed in place.
If rewrite has happened, this function returns True, otherwise, it returns False.
The problem is call function(s) that return [a] refcounted object - a lot of functions will do this, if they return an array that was passed in to the function (which was always legal in the CUDA target, even without NRT).
Describe the bug
Noticed whilst reviewing #167 - I enabled NRT with the environment variable
NUMBA_CUDA_ENABLE_NRT=1
and I got fed up of waiting for the test suite after 10 minutes (it usually completes in under 2 minutes on my machine).Steps/Code to reproduce bug
Run:
and be prepared to wait a long time.
Expected behavior
For the test suite to complete in approximately the same amount of time as with NRT disabled.
Environment details (please complete the following information):
Numba-cuda
main
.Additional context
There are two problems:
CUSource
object and linking that, so we only read it once and cut disk I/O out of the compilation path.The text was updated successfully, but these errors were encountered: