After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

Kinsue · 2024-03-29T02:30:12Z

In my training configuration, training can be done on a single 4090 by reducing the batch size. However, after the first Epoch ends, an "torch.cuda.OutOfMemoryError: CUDA out of memory" error occurs.
May I ask on what device is your team training? Should I continue to reduce the batch size?

ShengYun-Peng · 2024-04-03T01:28:34Z

Hi @Kinsue, perhaps reducing the batch size would help mitigate the issue. For this paper, the model was trained on A100 80G. I recommend trying out our latest work, namely UniTable, at https://github.com/poloclub/unitable. We have provided a tiny portion (20 samples) of PubTabNet for some toy pretraining and finetuning. Meanwhile, you can also control the max_seq_len and img_size to lower the GPU memory usage.

Kinsue · 2024-04-07T07:47:12Z

Thank you very much for your reply. I am happy to try out your latest work and appreciate your contributions.

Kinsue closed this as completed Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

Kinsue commented Mar 29, 2024

ShengYun-Peng commented Apr 3, 2024

Kinsue commented Apr 7, 2024

After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

Comments

Kinsue commented Mar 29, 2024

ShengYun-Peng commented Apr 3, 2024

Kinsue commented Apr 7, 2024