-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Hi @AndreWeiner,
when performing an SVD on a large dataset, I get the error message
RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1537, please report a bug to PyTorch. linalg.svd: Argument 12 has illegal value. Most certainly there is a bug in the implementation calling the backend library.
This seems to be a known issue and is related to:
- PyTorch: issue 93275, issue 102963, issue 68291, issue 51720
- SciPy: issue 5401, issue 21837
and some other.
In summary, the reason for this error is the following:
PyTorchandSciPyboth rely onLAPACK, which is compiled by default using 32-bit integer support. This can be verified by running e.g.scipy.__config__.show(), which yields in my case:
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
name: scipy-openblas
openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
pc file directory: /project
version: 0.3.28
lapack:
detection method: pkgconfig
found: true
include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
name: scipy-openblas
openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
pc file directory: /project
version: 0.3.28
- the max. number which can be represented is
torch.iinfo(torch.int32).max: 2147483647 - if the size of the data matrix exceeds this value, the
SVDcan't be computed, since the size of thedata_matrixis represented by anint32
To verify the lacking support of 64-bit integer (here for SciPy), one can also execute scipy.linalg.get_lapack_funcs("gesdd", ilp64=True) which should yield the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/blas.py", line 401, in getter
value = func(names, arrays, dtype, ilp64)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/lapack.py", line 992, in get_lapack_funcs
raise RuntimeError("LAPACK ILP64 routine requested, but Scipy "
RuntimeError: LAPACK ILP64 routine requested, but Scipy compiled only with 32-bit BLAS
Potential Workaround
Numpy on the other hand is build supporting 64-bit integer by default, which can be verified by executing numpy.__config__.show() yielding:
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
name: scipy-openblas
openblas configuration: OpenBLAS 0.3.27 USE64BITINT DYNAMIC_ARCH NO_AFFINITY
Haswell MAX_THREADS=64
pc file directory: /project/.openblas
version: 0.3.27
lapack:
detection method: pkgconfig
found: true
include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
name: scipy-openblas
openblas configuration: OpenBLAS 0.3.27 USE64BITINT DYNAMIC_ARCH NO_AFFINITY
Haswell MAX_THREADS=64
pc file directory: /project/.openblas
version: 0.3.27
(indicated by entry USE64BITINT).
Maybe we can replace the current implementation of the SVD class within flowtorch.data.svd with numpy.linalg.svd() to avoid this issue, or add some check of the data_matrix before computing the SVD.
Regards,
Janis