Releases: ROCm/hipBLAS
Releases · ROCm/hipBLAS
hipBLAS 0.48.0 for ROCm 4.5.0
Added
- Added HIPBLAS_STATUS_UNKNOWN for unsupported backend status codes
- Added more support for hipblas-bench
Fixed
- Avoid large offset overflow for gemv and hemv in hipblas-test
Changed
- Packaging split into a runtime package called hipblas and a development package called hipblas-devel. The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.
hipBLAS 0.45.0 for ROCm 4.3.1
No changes made for ROCm 4.3.1.
hipBLAS 0.45.0 for ROCm 4.3.0
Added
- Added hipblasStatusToString
Fixed
- Added catch() blocks around API calls to prevent the leak of C++ exceptions
hipBLAS-0.44.0 for ROCm 4.2.0
Added
- Made necessary changes to work with rocBLAS' gemm_ex changes. When using rocBLAS backend, hipBLAS will query the preferable
layout of int8 data to be passed to gemm_ex, and will pass in the resulting flag. Users must be sure to use the preferable
data format when calling gemm_ex with a rocBLAS backend. - Added hipblas-bench with support for:
- copy, swap, scal
hipBLAS-0.42.0 for ROCm 4.1.0
Added
Added the following functions. All added functions include batched and strided-batched support with rocBLAS backend:
axpy_ex
dot_ex
nrm2_ex
rot_ex
scal_ex
Fixed
Fixed complex unit test bug caused by incorrect caxpy and zaxpy function signatures
Known Issues
- None
hipBLAS-0.38.0 for ROCm 4.0.0
New Features
- No new features
Known Issues
- None
hipBLAS-0.38.0 for ROCm 3.10.0
New Features
- Added hipblasSetAtomicsMode and hipblasGetAtomicsMode
- No longer look for CUDA backend unless --cuda build flag is passed
Known Issues
- None
hipBLAS-0.36.0 for ROCm 3.9.0
New Features
Known Issues
- None
hipBLAS-0.34.0 for ROCm 3.8.0
New Features
- No new features
Known Issues
- None
hipBLAS-0.32.0 for ROCm 3.7.0
New Features
- Improvements to rocblas_Xgemm_batched performance for small m, n, k.
- Improvements to rocblas_Xgemv_batched and rocblas_Xgemv_strided_batched performance for small m (QMCPACK use).
- Improvements to rocblas_Xdot (batched and non-batched) performance when both incx and incy are 1
- FP32 ONNX BERT MI50 performance improved 28%
- FP32 BDAS MI50/MI60 Performance improved significantly
- Added substitution method for small trsm sizes with m <= 64 && n <= 64. Increases performance drastically for small batched trsm.
- Add Fortran interface for BLAS 1, BLAS 2, BLAS 3
- Add tbsv, tbsv_batched, and tbsv_strided_batched
- Add hemm, hemm_batched, and hemm_strided_batched
- Add symm, symm_batched, and symm_strided_batched
- Add complex versions of geam, along with geam_batched and geam_strided_batched
- Add gemm_batched_ex and gemm_strided_batched_ex
Known Issues
- None