Skip to content

PyTorch INTERNAL ASSERTION error when performing SVD on large datasets #55

@JanisGeise

Description

@JanisGeise

Hi @AndreWeiner,

when performing an SVD on a large dataset, I get the error message

RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1537, please report a bug to PyTorch. linalg.svd: Argument 12 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

This seems to be a known issue and is related to:

and some other.

In summary, the reason for this error is the following:

  • PyTorch and SciPy both rely on LAPACK, which is compiled by default using 32-bit integer support. This can be verified by running e.g. scipy.__config__.show(), which yields in my case:

Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
  • the max. number which can be represented is torch.iinfo(torch.int32).max: 2147483647
  • if the size of the data matrix exceeds this value, the SVD can't be computed, since the size of the data_matrix is represented by an int32

To verify the lacking support of 64-bit integer (here for SciPy), one can also execute scipy.linalg.get_lapack_funcs("gesdd", ilp64=True) which should yield the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/blas.py", line 401, in getter
    value = func(names, arrays, dtype, ilp64)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/lapack.py", line 992, in get_lapack_funcs
    raise RuntimeError("LAPACK ILP64 routine requested, but Scipy "
RuntimeError: LAPACK ILP64 routine requested, but Scipy compiled only with 32-bit BLAS

Potential Workaround

Numpy on the other hand is build supporting 64-bit integer by default, which can be verified by executing numpy.__config__.show() yielding:

Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27

(indicated by entry USE64BITINT).

Maybe we can replace the current implementation of the SVD class within flowtorch.data.svd with numpy.linalg.svd() to avoid this issue, or add some check of the data_matrix before computing the SVD.

Regards,
Janis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions