Open
Description
Hi ExplainingAI,
Thank you for sharing these amazing projects; they have been incredibly helpful and inspiring. I noticed a potential issue related to gradient accumulation when training autoencoders. Specifically, the calculated losses at each epoch are accurate when total_batch_size = autoencoder_batch_size * autoencoder_acc_steps fits perfectly. However, in cases where total_batch_size doesn’t fit (e.g., on GPUs that can accommodate larger batch sizes) and the last batch contains fewer samples, the calculated losses seem to be incorrect. Does this implementation assume that total_batch_size always fits perfectly?
I would greatly appreciate your guidance on this matter. Thank you in advance for your time!
Metadata
Metadata
Assignees
Labels
No labels