Skip to content

RuntimeError: CUDA error: invalid device ordinal #3

Closed
@elter-tef

Description

@elter-tef

When I load model I have this error.

Traceback (most recent call last):
File "", line 1, in
File "test/env/lib/python3.9/site-packages/galai/init.py", line 39, in load_model
model._load_checkpoint(checkpoint_path=get_checkpoint_path(name))
File "test/env/lib/python3.9/site-packages/galai/model.py", line 63, in _load_checkpoint
load_checkpoint_and_dispatch(
File "test/env/lib/python3.9/site-packages/accelerate/big_modeling.py", line 366, in load_checkpoint_and_dispatch
load_checkpoint_in_model(
File "test/env/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 701, in load_checkpoint_in_model
set_module_tensor_to_device(model, param_name, param_device, value=param)
File "test/env/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 124, in set_module_tensor_to_device
new_value = value.to(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions