-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss Calculation Issue with Incomplete Batches in Autoencoder Training #3
Comments
Hello @Malengu , |
I am trying to point out the following: Our desired batch size (total batch size) is actually: Next, we move to the loss calculations, where they should match the case where we could afford the Suppose our dataset contains these images: Case 1: [desired batch size with drop_last=False] for one epoch: for simplicity, let: Case 2: [our case (gradient accumulation)] Note: Here, that l am not scaling by (1/autoencoder_acc_steps); l am showing the losses you are appending in the losses lists. for one epoch: Since you are appending each batch loss calculation without autoencoder_acc_steps scaling: Observations
|
Thank you @Malengu for the detailed explanation of the issue. Regarding the second aspect that scaling factor of last batch would be less, yes, I think the repo assumes(unintentionally) the effective batch size to fit perfectly. However in case you want to fix it for your training, I think the below mentioned steps should make things better. The changes are for dit training but same applies for vae training as well.
This basically ensures that we are always accumulating gradient of acc_steps. In the last batch of epoch, when we have accumulated gradients of steps < acc_steps, we cycle through and wait until gradients for the remaining (acc_steps-steps) have been also accumulated in the next epoch. Only after that we update the parameters and zero out the gradients. Do let me know if you see any issues with these changes. |
Thank you for replying. I believe the code implementation is correct, but the issue arose when the effective batch size didn’t align properly. That’s when I noticed the problem. However, I think this wouldn’t affect the optimization since we are primarily interested in the argmin of the loss function. |
Hi ExplainingAI,
Thank you for sharing these amazing projects; they have been incredibly helpful and inspiring. I noticed a potential issue related to gradient accumulation when training autoencoders. Specifically, the calculated losses at each epoch are accurate when total_batch_size = autoencoder_batch_size * autoencoder_acc_steps fits perfectly. However, in cases where total_batch_size doesn’t fit (e.g., on GPUs that can accommodate larger batch sizes) and the last batch contains fewer samples, the calculated losses seem to be incorrect. Does this implementation assume that total_batch_size always fits perfectly?
I would greatly appreciate your guidance on this matter. Thank you in advance for your time!
The text was updated successfully, but these errors were encountered: