Repros of Issues on AMD GPUS

Building

It requires ROCm image with PyTorch (new versions are desired, i.e. >= 2.5).

python setup.py develop

See individual issue for running instructions.

Multi-GPU barrier.

It relies on GPU peer access over unified memory. Each rank allocates an array of flags, which is used to notify other ranks.

Run using:

torchrun --nproc-per-node 8 amd_repro/multi_gpu_barrier.py

It hangs after a few iterations despite all attempts to invalidate and write back caches.

Virtual Memory

Very slow reservation

It take over 60 secs to reserve 256GB of virtual space:

python amd_repro/virtual_memory_slow.py

Export/import of file descriptors

Segfaults

python amd_repro/virtual_memory_export.py

Deallocation

Segfaults

python amd_repro/virtual_memory_dealloc.py

IPC Event

If IPC is enabled, event always returns 'true' for the query() call.

torchrun --nproc-per-node 2 amd_repro/ipc_event.py

Output:

0 Waiting ...
1 Done
0 Waiting ...
0 Waiting ...
0 Waiting ...
0 Waiting ...
0 Done

Expected output (on NVIDIA H100):

0 Waiting ...
1 Waiting ...
0 Waiting ...
1 Waiting ...
0 Waiting ...
0 Done
1 Done

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
amd_repro		amd_repro
.gitignore		.gitignore
README.md		README.md
e2e-benchmark.md		e2e-benchmark.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repros of Issues on AMD GPUS

Building

Multi-GPU barrier.

Virtual Memory

Very slow reservation

Export/import of file descriptors

Deallocation

IPC Event

About

Releases

Packages

Languages

divchenko/amd-repro

Folders and files

Latest commit

History

Repository files navigation

Repros of Issues on AMD GPUS

Building

Multi-GPU barrier.

Virtual Memory

Very slow reservation

Export/import of file descriptors

Deallocation

IPC Event

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages