Added
- Added extended tests to
rtest.py
. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. - Use
python rtest.py [--emulation|-e|--test|-t]=extended
to run these tests. - Added regression tests to
rtest.py
. Regression tests are a subset of tests that caused hardware problems for past emulation environments.- Can be run with
python rtest.py [--emulation|-e|--test|-t]=regression
- Can be run with
- Added the parallel
find_first_of
device function with autotuned configurations, this function is similar tostd::find_first_of
, it searches for the first occurrence of any of the provided elements. - Added
--emulation
option added forrtest.py
- Unit tests can be run with
[--emulation|-e|--test|-t]=<test_name>
- Unit tests can be run with
- Added tuned configurations for segmented radix sort for gfx942 to improve performance on this architecture.
- Added a parallel device-level function,
rocprim::adjacent_find
, similar to the C++ Standard Librarystd::adjacent_find
algorithm. - Added configuration autotuning to device adjacent find (
rocprim::adjacent_find
) for improved performance on selected architectures. - Added rocprim::numeric_limits which is an extension of
std::numeric_limits
, which includes support for 128-bit integers. - Added rocprim::int128_t and rocprim::uint128_t which are the __int128_t and __uint128_t types.
- Added the parallel
search
andfind_end
device functions similar tostd::search
andstd::find_end
, these functions search for the first and last occurrence of the sequence respectively. - Added a parallel device-level function,
rocprim::search_n
, similar to the C++ Standard Librarystd::search_n
algorithm. - Added new constructors and a
base
function, and addedconstexpr
specifier to all functions inrocprim::reverse_iterator
to improve parity with the C++17std::reverse_iterator
. - Added hipGraph support to device run-length-encode for non trivial runs (
rocprim::run_length_encode_non_trivial_runs
). - Added configuration autotuning to device run-length-encode for non trivial runs (
rocprim::run_length_encode_non_trivial_runs
) for improved performance on selected architectures. - Added configuration autotuning to device run-length-encode for trivial runs (
rocprim::run_length_encode
) for improved performance on selected architectures. - Added a new type traits interface to enable users to provide additional type trait information to rocPRIM, facilitating better compatibility with custom types.
Changed
-
Changed the subset of tests that are run for smoke tests such that the smoke test will complete with faster run-time and to never exceed 2GB of vram usage. Use
python rtest.py [--emulation|-e|--test|-t]=smoke
to run these tests. -
The
rtest.py
options have changed.rtest.py
is now run with at least either--test|-t
or--emulation|-e
, but not both options. -
Changed the internal algorithm of block radix sort to use rank match to improve performance of various radix sort related algorithms.
-
Disabled padding in various cases where higher occupancy resulted in better performance despite more bank conflicts.
-
Removed HIP-CPU support. HIP-CPU support was experimental and broken.
-
Changed the C++ version from 14 to 17. C++14 will be deprecated in the next major release.
-
You can use CMake HIP language support with CMake 3.18 and later. To use HIP language support, run
cmake
with-DUSE_HIPCXX=ON
instead of setting theCXX
variable to the path to a HIP-aware compiler.
Resolved issues
- Fixed an issue where
rmake.py
would generate wrong CMAKE commands while using Linux environment - Fixed an issue where
rocprim::partial_sort_copy
would yield a compile error if the input iterator is const. - Fixed incorrect 128-bit signed and unsigned integers type traits.
- Fixed compilation issue when
rocprim::radix_key_codec<...>
is specialized with a 128-bit integer. - Fixed the warp-level reduction
rocprim::warp_reduce.reduce
DPP implementation to avoid undefined intermediate values during the reduction. - Fixed an issue that caused a segmentation fault when
hipStreamLegacy
was passed to some API functions.
Upcoming changes
-
Using the initialisation constructor of
rocprim::reverse_iterator
will throw a deprecation warning. It will be marked as explicit in the next major release. -
Using the initialisation constructor of rocprim::reverse_iterator will throw a deprecation warning. It will be marked as explicit in the next major release.