Releases: IntelPython/dpctl
v0.21.0
This release features the addition of new function tensor.isin, indexing of tensor.usm_ndarray with numpy.ndarray, and support for building dpctl for specific CUDA architectures.
Improvements were also made to the build time and binary size of the project, and to the build driver script, making it more convenient when building for CUDA or AMD devices.
Added
- Added
tensor.isinper future Python Array API specification version gh-2098 numpy.ndarraysare now permitted when indexing ontensor.usm_ndarraygh-2128
Changed
- Made a number of constexpr variables inline or static throughout the project, especially in headers, to reduce binary size and improve build time gh-2094, gh-2107
DPCTL_TARGET_CUDAandDPCTL_TARGET_HIPnow permit specifying the CUDA or HIP architectures gh-2096, gh-2099- Extended
build_locally.pybuild driver script to permit--target-cudaand--target-hipoptions, which match the behavior ofDPCTL_TARGET_CUDAandDPCTL_TARGET_HIPgh-2109 - Improved
tensor.asnumpyandtensor.to_numpyfor size-0 arrays gh-2120 - Permit type casting size-0
tensor.usm_ndarrayto arbitrary dtype viatensor.usm_ndarrayconstructor'sbufferkeyword (i.e., using the original memory as the buffer for the new size-0 array's underlying memory) gh-2123
Fixed
- Fixed
tensor.asarrayfailing when givendevicekeyword with an input array of a dtype not supported bydevicegh-2097 - Fixes undefined behavior in radix sort algorithm and avoids call to sorting algorithms when calling
tensor.sortandtensor.argsorton size-1 arrays, or along a size-1 axis gh-2106 - Fixed incorrect results when calling
dpt.astypeontensor.usm_ndarrayconstructed from a boolean view into anumpy.ndarraygh-2122 - Fixed
dpctlimported in virtual environment on Windows failing to see devices or find DLLs gh-2130 - Fixed Cythonization failure when testing the ability to create
dpctlCython API extensions with an editable install gh-2147
Maintenance
- Revert restricting Cython to below 3.1.0 when building dpctl for Python 3.13 gh-2118
- Add a link to
tensor.DLDeviceTypedocumentation from__dlpack_device__docstring gh-2127 - Update pybind11 to 3.0.1 gh-2145
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070 gh-2088, gh-2104, gh-2151, gh-2154, gh-2155
v0.20.2
This release is identical to 0.20.1 in terms of features
This release adds metadata and conda recipe changes intended for releasing the dpctl package with Python 3.13
v0.20.1
v0.20.0
This release achieves compliance of dpctl.tensor with the Python Array API 2024.12 standard.
The dpctl namespace has also received a number of new features, including new Python classes dpctl.LocalAccessor, dpctl.WorkGroupMemory, and dpctl.RawKernelArg to be used as kernel argument types, support for peer access between dpctl.SyclDevice instances, and support for composite Level Zero devices.
Added
- Added
dpctl.WorkGroupMemoryclass representingsycl::ext::oneapi::experimental::work_group_memory, to be used as a kernel argument type gh-1984 - Added
dpctl.LocalAccessorclass representingsycl::local_accessor, to be used as a kernel argument type gh-1991 - Added
dpctl.SyclPlatform.get_devicesmethod for getting alldpctl.SyclDevicesfor the platform gh-1992 - Added support for the composite devices extension for Level Zero devices, usable with some devices when setting
ZE_FLAT_DEVICE_HIERARCHY=COMBINEDgh-1993 - Added
outkeyword totensor.takegh-2010 - Added
dpctl.RawKernelArgclass representingsycl::ext::oneapi::experimental::raw_kernal_arg, to be used as a kernel argument type gh-2038 - Added
dpctl.SyclDevicemethods for querying, enabling, and disabling peer access between devices gh-2077, gh-2082
Changed
- Updated Level Zero loader detection to no longer rely on reading
libur_adapter_level_zero.sofor the loader filename gh-2025 - Updated integer array indexing to align with the 2024.12 array API specification gh-2032
- Support for Boolean data-type is added to
dpctl.tensor.ceil,dpctl.tensor.floor, anddpctl.tensor.truncgh-2033 - Changed implementation of
DPCTLPlatform_GetDefaultContextfrom using deprecatedext_oneapi_get_default_contexttokhr_get_default_contextgh-2042 - Updated supported array API specification version to 2024.12 gh-2047
- Implementation struct for
tensor.imagnow uses a static member value for the imaginary part of real-valued inputs gh-2063 - Updated
reprto show the shape of the abbreviated arrays and show the shape and data type of zero-size arrays gh-2067 - Changed
tensor.__array_namespace_info__().capabilities()[]"max dimensions"]toNonegh-2071
Fixed
- Refactored code common to accumulation operations (
dpt.cumulative_sum,dpt.cumulative_prod,dpt.cumulative_logsumexp) and removed unnecessary event initialization gh-2011 - Fixed incorrect results for
dpt.cumulative_sumanddpt.cumulative_prodwhendtype=dpt.boolgh-2018 - Fixed a typo in
dpctl.SyclPlatformrepr gh-2035 - Fixed a bug in
tensor.asarraywhereorder="K"could fail to produce an array sufficient for the internal copy operation for some edge cases, including a contiguous array with permuted dimensions gh-2058 - Fixed a typo in
dpctl.memory.USMAllocationErrorgh-2072
Maintenance
- Document
dpctl.device_type,dpctl.backend_type,dpctl.event_status_type, anddpctl.global_mem_cache_typeenums gh-2019 - Updated
SYCL_INCLUDE_DIR_HINTin Conda recipe gh-2039 - Updated expected dtypes in element-wise function docstrings gh-2041, gh-2048
- Set
ARRAY_API_TESTS_VERSION=2024.12when running array API conformity job in CI gh-2046 - Install
hwlocwhen running CI job for nightly SYCL compiler gh-2050 - Added
cython-linttopre-committo improve style and readability of Cython code gh-2056 - Skip upload jobs when GitHub CI is called from a forked repo gh-2059
- Disable nightly tests run from forked repos gh-2060
- Fixed a typo in beginner's guide example gh-2061
- Updated bandit version gh-2075
- Updated Conda installation instructions gh-2080, gh-2081
- Fixed an incorrect link to changelog in package metadata gh-2085
- Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts gh-2020, gh-2034, gh-2043, gh-2044, gh-2065, gh-2066, gh-2068, gh-2070
New Contributors
- @jharlow-intel made their first contribution in #2054
- @david-cortes-intel made their first contribution in #2080
v0.19.0
This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.
A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.
Added
- Support for compiling
dpctlfor specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731 - Added
tensor.top_kper Python Array API specification gh-1921 - Added functions
tensor.dldevice_to_sycl_deviceandtensor.sycl_device_to_dldevicefor converting between DLPack and sycl devices, and a methodget_device_idtodpctl.SyclDeviceto improve interoperability with DLPack protocol gh-1953 - Added
DPCTL_OFFLOAD_COMPRESScmake option (set toOFFby default) to toggle --offload-compress linker option when buildingdpctlgh-1961
Changed
- Improved performance of copy-and-cast operations from
numpy.ndarraytotensor.usm_ndarrayfor contiguous inputs gh-1829 py_sortandpy_argsortnow throwpy::value_errorif inputs are not C-contiguous gh-1838- Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
- Improved performance of
tensor.argsortfunction for all types gh-1859 - Improved performance of
tensor.sortandtensor.argsortfor short arrays in the range [16, 64] elements gh-1866 - Implemented radix sort algorithm to be used in
dpt.sortanddpt.argsortgh-1867, gh-1883 - Extended
dpctl.SyclTimerwithdevice_timerkeyword, implementing different methods of collecting device times gh-1872 dpctlchanged to see GPU devices out of the box in virtual environment on Windows gh-1922- Improved performance of
tensor.cumulative_sum,tensor.cumulative_prod,tensor.cumulative_logsumexpas well as performance of boolean indexing gh-1923, gh-1942 - Improved performance of
tensor.min,tensor.max,tensor.logsumexp,tensor.reduce_hypotfor floating point type arrays by at least 2x gh-1932, gh-1937 - Updated Cython examples to use scikit-build gh-1935
- Reduced binary size of
_tensor_accumulation_implby 13 MB gh-1957 - Extended
tensor.asarrayto support objects that implement__usm_ndarray__property to be interpreted asusm_ndarrayobjects gh-1959 tensor.usm_ndarrayobject disallows implicit conversions to NumPy array gh-1964streamarguments intensor.usm_ndarraymethods now raise an error ifstreamis not atensor.SyclQueuegh-1969dpctlinitialization sets subprocess to use SPAWN method on Linux to enablegdb-oneapito debug kernels submitted from Python applications gh-1971- Reduced binary size of
_tensor_elementwise_implgh-1976 - Allow
dpctl.SyclQueue.memcpyto and from multi-dimensional buffers gh-1985
Fixed
- Fixed a bug in
tensor.rollfor very large values ofshiftgh-1869 - Fix for
tensor.result_typewhen all inputs are Python built-in scalars gh-1877 - Improved error in constructors
tensor.fullandtensor.full_likewhen provided a non-numeric fill value gh-1878 - Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
- Fixed
dpctlinstalled into virtual environment not finding DPC++ runtime libraries by addingDPCTL_WITH_REDISTcmake option (set toOFFby default) gh-1893 - Fixed incorrect result (issue gh-1901) in
tensor.cumulative_sumand in advanced indexing gh-1902 - Fixed
__setitem__()fortensor.usm_ndarraywhen passed an empty boolean mask gh-1915 tensor.from_dlpackdocstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919- Fixed docstring in helper class in DLPack tests gh-1920
- Fixed a bug in
tensor.astypewherecopy=Falsewould not be respected for 1d arrays when order keyword is specified gh-1928 - Replaced deprecated
CL/sycl.hppwith recommendedsycl/sycl.hppin examples gh-1933 - Fixed
tensor.take_along_axisandtensor.put_along_axisraising an error fortensor.uint64indices when given an array of dimension greater than 1 gh-1934 - Fixed unexpected results of
tensor.sumwith a requested output type ofboolgh-1958 - Use
std::moveto avoid unnecessary copying of temporary intriul_ctor.cppgh-1960 - Make
streama keyword-only argument intensor.usm_ndarray.to_deviceper requirement by array API specification gh-1966 - Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in
tensor.argsortfor 1d input gh-1967 - Corrected uses of NumPy constructors with
tensor.usm_ndarrayinputs in test suite gh-1968 - Fixed array API namespace inspection utilities showing
complex128as a valid dtype on devices without double precision anddevicekeywords not working withdpctl.SyclQueueor filter strings gh-1979 - Fixed a bug in
test_sycl_device_interface.cppwhich would cause compilation to fail with Clang version 20.0 gh-1989 - Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
UsmNDArray_MakeSimpleFromPtrandUsmNDArray_MakeFromPtrnow raise an error when provided an invalidtypenumbefore attempting to create the array gh-2003- Fixed typos in
tensor.from_numpyandtensor.astypegh-2006
Maintenance
- Revert pinning of cmake to 3.26 on Windows gh-1823
- Update black version used in Python code style workflow gh-1828
- Fixed CI/CD workflow for building conda packages on Windows gh-1831
- Revert work-around in
test_sycl_kernel_submit.pyfor problem in MKL 2024.2.0 gh-1836 - Do not use Mambaforge variant of miniforge as deprecated gh-1844
- Use pybind11=2.13.6 gh-1845
- Remove unnecessary include in C++ header file gh-1846
- Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
- Add instructions for installing
dpctlfrom Intel PyPi channel gh-1860 - Fix warnings when generating docs gh-1855, gh-1861
- Align conda recipe with conda-forge's
{{ stdlib("c") }}migration gh-1868 - Add missing include of SYCL header to "math_utils.hpp" gh-1899
- Add support of CV-qualifiers in
is_complex<T>helper gh-1900 - Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
- Reduce binary ...
v0.18.3
v0.18.2
This is a bug-fix release, see https://github.com/IntelPython/dpctl/milestone/15.
It backports fixes for
tensor.result_typebehavior for scalars (see gh-1874) and- errors when using
dpctlin virtual environment on Linux (gh-1892).
Changes from PR gh-1899 were also backported.
v0.18.1
This is incremental release where only installation instructions in README were updated to reflect the change in location of index with Python packages built by Intel(R) relative to 0.18.0 release.
v0.18.0
This release reaches an important milestone of making offloading fully asynchronous.
Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
Added
- Implement
tensor.take_along_axisper Python Array API specification gh-1778 - Implement
tensor.put_along_axisto complementtensor.take_along_axisgh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpackfunction andtensor.usm_ndarray.__dlpack__method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafterfunction per Python Array API specification gh-1730 - Implement
tensor.count_nonzeroandtensor.difffunctions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"to*_likearray creation functions, and change defaultorderkeyword value from'C'to'K'gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memoryclass defined indpctl4pybind11.hppadds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
Change
- Change ownership of USM allocation by
dpctl.memoryobjects, make executions ofdpctl.tensoroperations asynchronous gh-1705 - Add support for Python scalars by
tensor.wherefunction gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean,tensor.std,tensor.vargh-1820 - Use transcendental functions from
syclnamespace instead ofstdnamespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zerosto use asynchronousmemsetoperation gh-1806 - The setter of
tensor.usm_ndarray.shapeproperty now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocatorused withstd::vectorgh-1791 - Use
dpctl::tensor::offset_utils::sycl_free_noexceptinstead ofsycl::freeinhost_tasktasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"-style casting for in-place mathematical operators oftensor.usm_ndarraygh-1827, gh-1830
Fixed
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1dandDPCTLDevice_GetMaxWorkGroupSize2dgh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.roundbehavior on CUDA devices gh-1700 - Add missing
#include <sstream>gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extractfunction gh-1727 - Fix for
tensor.unique_allandtensor.unique_inverseto always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__intensor.asarrayfunction gh-1756 tensor.clipto handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divideand comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.testsgh-1833
Maintenance
- Improve performance of
test_sort_complex_fp_nangh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()gh-1720 - Remove
templatekeyword in method call ofsycl::kernel_bundlegh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTERgh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>for definition ofstd::moveused gh-1787 - Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flagsclass gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexerclass gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::eventassociated with compute task to vector of events representing execution ofhost_taskgh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctlgh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, [gh-1721](https...
0.17.0
This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.
Added
- Added pybind11 caster for
sycl::halfto map to/from Pythonfloatto"dpctl4pybind11.hpp"header: gh-1655 - Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
- Implemented
tensor.cumulative_sum,tensor.cumulative_prodandtensor.cumulative_logsumexp: gh-1602
Changed
- Expanded documentation for
dpctl: gh-1619 - Expanded
utils.intel_device_infofunctionality: gh-1656 - Improved performance of elementwise operations: gh-1651
- Efficiency improvement by avoiding unnecessary copying of
sycl::queue: gh-1645 dpctluses pybind11 2.12.0: gh-1640- Improved performance of
tensor.reshapeoperation withorder="F"when copying is needed, or requested: gh-1677
Fixed
- Fixed initialization of byte type constants in
dpctl_capiPython/C API loader class in"dpctl4pybind11.hpp": gh-1665 - Fixed crash in
tensor.sortreported for a CPU device and a CUDA device: gh-1676 - Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
- Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
- Support use of index arrays of different integral types in indexing operations: gh-47
- Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
- Corrected
tensor.tilefor scalar inputs and empty repetitions: gh-1628 - Fixed support for
outkeyword intensor.matmul: gh-1610 - Fixed bug in basic slicing of empty arrays: gh-1680
- Fixed bug in
tensor.bitwise_invertfor boolean input array: gh-1681 - Fixed bug in
tensor.repeaton zero-size input arrays: gh-1682
New Contributors
- @bdmoore1 made their first contribution in #1659
- @ekomarova made their first contribution in #1666
Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md