Skip to content

some question about train #8

@FFY0207

Description

@FFY0207

Epoch 1, Batch 3, Loss: 7.225614070892334
Train step: 2it [00:05, 2.95s/it]
Traceback (most recent call last):
File "/mnt/e/code/silent_speech/transduction_model.py", line 365, in
main()
File "/mnt/e/code/silent_speech/transduction_model.py", line 361, in main
model = train_model(trainset, devset, device, save_sound_outputs=save_sound_outputs)
File "/mnt/e/code/silent_speech/transduction_model.py", line 260, in train_model
loss.backward() # 反向传播
File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: unknown error
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What problem did I encounter? I lowered the size of the batch, but it didn't work and the error still occurred

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions