Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

Closed
Kinsue opened this issue Mar 29, 2024 · 2 comments

Comments

@Kinsue
Copy link

Kinsue commented Mar 29, 2024

In my training configuration, training can be done on a single 4090 by reducing the batch size. However, after the first Epoch ends, an "torch.cuda.OutOfMemoryError: CUDA out of memory" error occurs.
May I ask on what device is your team training? Should I continue to reduce the batch size?

@ShengYun-Peng
Copy link
Contributor

Hi @Kinsue, perhaps reducing the batch size would help mitigate the issue. For this paper, the model was trained on A100 80G. I recommend trying out our latest work, namely UniTable, at https://github.com/poloclub/unitable. We have provided a tiny portion (20 samples) of PubTabNet for some toy pretraining and finetuning. Meanwhile, you can also control the max_seq_len and img_size to lower the GPU memory usage.

@Kinsue
Copy link
Author

Kinsue commented Apr 7, 2024

Thank you very much for your reply. I am happy to try out your latest work and appreciate your contributions.

@Kinsue Kinsue closed this as completed Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants