Torch Scatter gives an illegal memory access

Hi @rusty1s,

Thanks for the awesome work of putting together and maintaining pytorch_scatter. 
I'm facing an issue with scatter. 
When I run the following code:
```python
from torch_scatter import scatter
import torch
x_j = torch.randn((12143200, 192), dtype=torch.float32).to('cuda:0')
edge_index = torch.randint(low=0, high=73727, size=(12143200,)).to('cuda:0')
out = scatter(src=x_j.to(torch.float32), index=edge_index, dim=0, dim_size=73728, reduce='max') 
print(out)
```
I'm setting `export CUDA_LAUNCH_BLOCKING=1` before running this code

I'm using one V100 GPU with 32GB of memory to run this code, here's my `nvidia-smi` data:
```bash
Sat Aug 10 13:21:38 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:06:00.0 Off |                    0 |
| N/A   33C    P0    42W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:07:00.0 Off |                    0 |
| N/A   34C    P0    43W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:0A:00.0 Off |                    0 |
| N/A   34C    P0    46W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   33C    P0    43W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  Off  | 00000000:85:00.0 Off |                    0 |
| N/A   33C    P0    43W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   34C    P0    44W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  Off  | 00000000:89:00.0 Off |                    0 |
| N/A   36C    P0    44W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   33C    P0    43W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

Here's my conda environment:
```yml
name: MyEnv
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - blas=1.0=mkl
  - brotli-python=1.0.9=py37hd23a5d3_7
  - bzip2=1.0.8=h7f98852_4
  - ca-certificates=2023.08.22=h06a4308_0
  - certifi=2022.12.7=py37h06a4308_0
  - charset-normalizer=3.3.0=pyhd8ed1ab_0
  - cudatoolkit=10.2.89=hfd86e86_1
  - ffmpeg=4.3.2=hca11adc_0
  - flit-core=3.6.0=pyhd3eb1b0_0
  - freetype=2.12.1=h4a9f257_0
  - giflib=5.2.1=h5eee18b_3
  - gmp=6.2.1=h58526e2_0
  - gnutls=3.6.13=h85f3911_1
  - idna=3.4=pyhd8ed1ab_0
  - intel-openmp=2023.1.0=hdb19cb5_46305
  - jpeg=9b=h024ee3a_2
  - lame=3.100=h7f98852_1001
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libpng=1.6.39=h5eee18b_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libtiff=4.2.0=h85742a9_0
  - libuv=1.44.2=h5eee18b_0
  - libwebp=1.2.0=h89dd481_0
  - libwebp-base=1.2.0=h27cfd23_0
  - lz4-c=1.9.4=h6a678d5_0
  - mkl=2020.2=256
  - mkl-service=2.3.0=py37he8ac12f_0
  - mkl_fft=1.3.0=py37h54f3939_0
  - mkl_random=1.1.1=py37h0573a6f_0
  - ncurses=6.4=h6a678d5_0
  - nettle=3.6=he412f7d_0
  - ninja=1.10.2=h06a4308_5
  - ninja-base=1.10.2=hd09550d_5
  - openh264=2.1.1=h780b84a_0
  - openssl=1.1.1w=h7f8727e_0
  - pillow=9.3.0=py37hace64e9_1
  - pip=22.3.1=py37h06a4308_0
  - pysocks=1.7.1=py37h89c1867_5
  - python=3.7.16=h7a1cb2a_0
  - python_abi=3.7=2_cp37m
  - pytorch-mutex=1.0=cuda
  - pyyaml=6.0=py37h5eee18b_1
  - readline=8.2=h5eee18b_0
  - requests=2.31.0=pyhd8ed1ab_0
  - setuptools=65.6.3=py37h06a4308_0
  - six=1.16.0=pyhd3eb1b0_1
  - sqlite=3.41.2=h5eee18b_0
  - tbb=2021.8.0=hdb19cb5_0
  - timm=0.3.2=pyhd8ed1ab_0
  - tk=8.6.12=h1ccaba5_0
  - typing_extensions=4.4.0=py37h06a4308_0
  - urllib3=2.0.6=pyhd8ed1ab_0
  - wheel=0.38.4=py37h06a4308_0
  - x264=1!161.3030=h7f98852_1
  - xz=5.4.2=h5eee18b_0
  - yaml=0.2.5=h7b6447c_0
  - zlib=1.2.13=h5eee18b_0
  - zstd=1.4.9=haebb681_0
  - pip:
      - cffi==1.15.1
      - cryptography==42.0.5
      - cupy-cuda102==11.6.0
      - cycler==0.11.0
      - fastrlock==0.8.2
      - fonttools==4.38.0
      - jinja2==3.1.3
      - joblib==1.3.2
      - kiwisolver==1.4.5
      - markupsafe==2.1.5
      - matplotlib==3.5.3
      - numpy==1.21.6
      - nvidia-cublas-cu11==11.10.3.66
      - nvidia-cuda-nvrtc-cu11==11.7.99
      - nvidia-cuda-runtime-cu11==11.7.99
      - nvidia-cudnn-cu11==8.5.0.96
      - packaging==23.2
      - pandas==1.3.5
      - psutil==5.9.8
      - pycparser==2.21
      - pydeprecate==0.3.2
      - pyopenssl==24.1.0
      - pyparsing==3.1.1
      - python-dateutil==2.8.2
      - pytz==2023.3.post1
      - scikit-learn==1.0.2
      - scipy==1.7.3
      - threadpoolctl==3.1.0
      - torch==1.7.1+cu110
      - torch-geometric==2.3.1
      - torch-scatter==2.0.7
      - torchaudio==0.7.2
      - torcheval==0.0.7
      - torchmetrics==0.7.2
      - torchprofile==0.0.4
      - torchvision==0.8.2+cu110
      - tqdm==4.66.2
```

This is the error I face:
```bash
Traceback (most recent call last):
  File "playground.py", line 5, in <module>
    out = scatter(src=x_j.to(torch.float32), index=edge_index, dim=0, dim_size=73728, reduce='max') 
  File "/raid/ismail2/miniconda3/envs/MyEnv/lib/python3.7/site-packages/torch_scatter/scatter.py", line 161, in scatter
    return scatter_max(src, index, dim, out, dim_size)[0]
  File "/raid/ismail2/miniconda3/envs/MyEnv/lib/python3.7/site-packages/torch_scatter/scatter.py", line 73, in scatter_max
    return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
RuntimeError: CUDA error: an illegal memory access was encountered
```
I've been stuck here for a while and would really appreciate any help on this. Thanks.

PS: AFAIU, the illegal memory error is different from the out-of-memory error. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torch Scatter gives an illegal memory access #456

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Torch Scatter gives an illegal memory access #456

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions