Skip to content

Python tracing 1.5x slower in docker for Stable Diffusion #8947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bhavya01 opened this issue Apr 7, 2025 · 1 comment
Open

Python tracing 1.5x slower in docker for Stable Diffusion #8947

bhavya01 opened this issue Apr 7, 2025 · 1 comment
Assignees

Comments

@bhavya01
Copy link
Collaborator

bhavya01 commented Apr 7, 2025

🐛 Bug

While running SDXL model on v5p, I see the following differences in tracing times on docker vs native TPU

Docker:

step:  47
dataloading time 0.44275736808776855
forward_time = 0.3831048011779785
backward time = 0.42351746559143066
optimizer step = 0.359987735748291
step:  48
dataloading time 0.38611793518066406
forward_time = 0.36144566535949707
backward time = 0.43265414237976074
optimizer step = 0.35251617431640625
step:  49
dataloading time 0.4026143550872803
forward_time = 0.35672855377197266
backward time = 0.42094969749450684
optimizer step = 0.35718798637390137

Native TPU:

step:  47
dataloading time 0.2109665870666504
forward_time = 0.2806096076965332
backward time = 0.3262217044830322
optimizer step = 0.2687253952026367
step:  48
dataloading time 0.24223923683166504
forward_time = 0.2692677974700928
backward time = 0.32848691940307617
optimizer step = 0.2610747814178467
step:  49
dataloading time 0.236525297164917
forward_time = 0.2631101608276367
backward time = 0.3343634605407715
optimizer step = 0.2612447738647461

To Reproduce

The training script is at: https://github.com/entrpn/diffusers/blob/sdxl_training_bbahl/examples/research_projects/pytorch_xla/training/text_to_image/README_sdxl.md

I took a ubuntu 22.04 docker image. Installed miniconda on it. Built torch and torch_xla wheels with export _GLIBCXX_USE_CXX11_ABI=1 and ran the above training script.

Expected behavior

Docker shouldn't have any overhead.

Environment

  • Reproducible on XLA backend [CPU/TPU/CUDA]: TPU
  • torch_xla version: nightly 03/21/2025
@bhavya01 bhavya01 self-assigned this Apr 7, 2025
@bhavya01 bhavya01 changed the title Python tracing 2x slower in docker for Stable Diffusion Python tracing 1.5x slower in docker for Stable Diffusion Apr 7, 2025
@bhavya01
Copy link
Collaborator Author

bhavya01 commented Apr 7, 2025

Adding another datapoint for resnet50 example :
docker:

epoch: 1, step: 290, loss: 6.602956771850586, rate: 4010.4757487626957
forward_time: 0.008185625076293945
backward_time: 0.004477977752685547
optimizer_time: 0.006538867950439453

native:

epoch: 1, step: 290, loss: 6.609467029571533, rate: 3692.5472361850266
forward_time: 0.007588624954223633
backward_time: 0.004286527633666992
optimizer_time: 0.006213188171386719

For Resnet50, docker is 6% slower than native

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant