PyTorch INTERNAL ASSERTION error when performing SVD on large datasets

Hi @AndreWeiner,

when performing an SVD on a large dataset, I get the error message

`RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1537, please report a bug to PyTorch. linalg.svd: Argument 12 has illegal value. Most certainly there is a bug in the implementation calling the backend library.`

This seems to be a known issue and is related to:

- PyTorch: [issue 93275](https://github.com/pytorch/pytorch/issues/93275), [issue 102963](https://github.com/pytorch/pytorch/issues/102963), [issue 68291](https://github.com/pytorch/pytorch/issues/68291), [issue 51720](https://github.com/pytorch/pytorch/issues/51720)
- SciPy: [issue 5401](https://github.com/scipy/scipy/issues/5401), [issue 21837](https://github.com/scipy/scipy/issues/21837)

and some other. 

In summary, the reason for this error is the following:

- `PyTorch` and `SciPy` both rely on `LAPACK`, which is compiled by default using 32-bit integer support. This can be verified by running e.g. `scipy.__config__.show()`, which yields in my case:
```

Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
```
- the max. number which can be represented is `torch.iinfo(torch.int32).max`: 2147483647
- if the size of the data matrix exceeds this value, the `SVD` can't be computed, since the size of the `data_matrix` is represented by an `int32`

To verify the lacking support of 64-bit integer (here for `SciPy`), one can also execute `scipy.linalg.get_lapack_funcs("gesdd", ilp64=True)` which should yield the error:

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/blas.py", line 401, in getter
    value = func(names, arrays, dtype, ilp64)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/lapack.py", line 992, in get_lapack_funcs
    raise RuntimeError("LAPACK ILP64 routine requested, but Scipy "
RuntimeError: LAPACK ILP64 routine requested, but Scipy compiled only with 32-bit BLAS
```


**Potential Workaround**

`Numpy` on the other hand  is build supporting 64-bit integer by default, which can be verified by executing `numpy.__config__.show()` yielding:

```
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27
```
(indicated by entry `USE64BITINT`).

Maybe we can replace the current implementation of the `SVD` class within `flowtorch.data.svd` with `numpy.linalg.svd()` to avoid this issue, or add some check of the `data_matrix` before computing the SVD.

Regards,
Janis


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch INTERNAL ASSERTION error when performing SVD on large datasets #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PyTorch INTERNAL ASSERTION error when performing SVD on large datasets #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions