You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing these amazing projects; they have been incredibly helpful and inspiring. I noticed a potential issue related to gradient accumulation when training autoencoders. Specifically, the calculated losses at each epoch are accurate when total_batch_size = autoencoder_batch_size * autoencoder_acc_steps fits perfectly. However, in cases where total_batch_size doesn’t fit (e.g., on GPUs that can accommodate larger batch sizes) and the last batch contains fewer samples, the calculated losses seem to be incorrect. Does this implementation assume that total_batch_size always fits perfectly?
I would greatly appreciate your guidance on this matter. Thank you in advance for your time!