`python3 test_flash_mm.py` got error #1

tiendung · 2023-08-01T19:48:32Z

ERROR: CUDA RT call "cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, shared_memory_size )" in line 695 of file mm/csrc/flashmm/mm_block_fwd_cuda.cu failed with invalid device function (98).
max diff for mm block: tensor(2.0590e-05, device='cuda:0', grad_fn=<SelectBackward0>)
average diff for mm block: tensor(2.9658e-06, device='cuda:0', grad_fn=<MeanBackward0>)
max diff: tensor(0.0003, device='cuda:0')
avg diff: tensor(7.4159e-05, device='cuda:0')

I still can run the trainer and the loss go down,

The text was updated successfully, but these errors were encountered:

DanFu09 · 2023-08-01T20:00:28Z

This is usually a result of a mixmatch in CUDA versions: https://forums.developer.nvidia.com/t/cudalaunchkernel-returned-status-98-invalid-device-function/169958

Can you try it with the NVIDIA PyTorch docker container? https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

tiendung · 2023-08-03T02:20:09Z

is it functional correctly despite of CUDA mismatch? I'm running mm-bert and the loss is going down as usual.

DanFu09 · 2023-08-03T02:38:18Z

The training loop is falling back to regular PyTorch, so that’s why the loss is going down.

tiendung · 2023-08-03T02:42:55Z

I see. Thank @DanFu09

tiendung · 2023-08-06T03:05:07Z

May I ask one more question @DanFu09, I wonder how much faster the flash_mm kernel compare to pytorch implementation?

tiendung closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`python3 test_flash_mm.py` got error #1

`python3 test_flash_mm.py` got error #1

tiendung commented Aug 1, 2023

DanFu09 commented Aug 1, 2023 •

edited

Loading

tiendung commented Aug 3, 2023

DanFu09 commented Aug 3, 2023

tiendung commented Aug 3, 2023

tiendung commented Aug 6, 2023

python3 test_flash_mm.py got error #1

python3 test_flash_mm.py got error #1

Comments

tiendung commented Aug 1, 2023

DanFu09 commented Aug 1, 2023 • edited Loading

tiendung commented Aug 3, 2023

DanFu09 commented Aug 3, 2023

tiendung commented Aug 3, 2023

tiendung commented Aug 6, 2023

`python3 test_flash_mm.py` got error #1

`python3 test_flash_mm.py` got error #1

DanFu09 commented Aug 1, 2023 •

edited

Loading