You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Fix typo: stream._handle -> stream.handle
Stream class does not have _handle data member.
* Move definition of LaunchConfig class to separate file
This is necessary to avoid circular dependency.
Cluster-related occupancy functions need LaunchConfig.
Occupancy functions are defined in _module.py, and _launcher.py
that used to house definition of LaunchConfig imports Kernel
from _module.py
* Introduce _module.KernelOccupancy class
This class defines kernel occupancy query methods.
- max_active_blocks_per_multiprocessor
- max_potential_block_size
- available_dynamic_shared_memory_per_block
- max_potential_cluster_size
- max_active_clusters
Implementation is based on driver API. The following
occupancy-related driver functions are not used
- `cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- `cuOccupancyMaxPotentialBlockSizeWithFlags`
In `cuOccupancyMaxPotentialBlockSize`, only constant dynamic shared-memory size
is supported for now. Supporting variable dynamic shared-memory size that depends
on the block size is deferred until design is resolved.
* Add occupancy tests, except for cluster-related queries
* Fix type in querying handle from Stream argument
* Add tests for cluster-related occupancy descriptors
* Introduce MaxPotentialBlockSizeOccupancyResult named tuple
Use it as return type for the KernelOccupancy.max_potential_block_size
output.
* KernelOccupancy.max_potential_block_size support for CUoccupancyB2DSize
cuda_utils.driver.CUoccupancyB2DSize type is supported. Required
size of dynamic shared memory allocation renamed to
dynamic_shared_memory_needed
* Add test for B2DSize usage in max_potential_block_size
Test requires Numba. If numba is absent, it is skipped, otherwise
`numba.cfunc` is used to compile Python function. ctypes.CFuncPtr
object obtained from cfunc_res.ctypes is converted to
CUoccupancyB2DSize.
* Improved max_potential_block_size.__doc__
Expanded the docstring, added advisory about possibility of deadlocks
should function encoded CUoccupancyB2DSize require GIL.
Added argument type validation for dynamic_shared_memory_needed
argument.
* Add test for dynamic_shared_memory_needed arg of invalid type
* Mention feature/occupancy in 0.3.0 release notes
* Add symbols to api_private.rst
* Reduce test name verbosity
Occupancy tests need not contain saxpy in the test name even though it
uses saxpy kernel for testing.
* Add doc-strings to KernelOccupancy methods.
* fix rendering
0 commit comments