ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) - Working Setup + Fixes #404
tuckeranglemyer-pixel
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) — Working Setup + Fixes
Got ACE-Step 1.5 running on an AMD GPU on Windows with ROCm 7.2. Documenting the full setup and the four fixes required since this doesn't work out of the box. Hopefully this saves someone else a few hours.
Hardware & Environment
Installation Steps
1. Clone the repo
2. Install ROCm SDK components
3. Install PyTorch for ROCm
4. Install ROCm-compatible dependencies
5. Verify GPU detection
Expected output:
Required Fixes (4 total)
Fix 1:
vector_quantize_pytorch—torch.distributed.groupimport errorROCm on Windows doesn't ship
torch.distributedwith the full distributed training backend. Thevector_quantize_pytorchpackage imports it unconditionally and crashes.File:
C:\Users\<USER>\AppData\Local\Programs\Python\Python312\Lib\site-packages\vector_quantize_pytorch\lookup_free_quantization.pyFind (lines 14-15):
Replace with:
Fix 2:
torchao— incompatible with ROCm Windowstorchaotries to register distributed ops (_c10d_functional.all_gather_into_tensor) that don't exist in the ROCm Windows build. Therequirements-rocm.txtalready excludes it, but if you installed fromrequirements.txtfirst, it's still there.Fix:
Note: This disables quantization features (int8/int4 weight quantization). The model runs fine at full precision — you have 20 GB of VRAM.
Fix 3: Launch command — bypass the batch script
start_gradio_ui_rocm.batlooks for avenv_rocmvirtual environment that doesn't exist when you install to system Python. It will refuse to launch even though everything is correctly installed.Instead of:
Run directly:
Fix 4:
torchaudiodefaults totorchcodecwhich is incompatible with ROCm Windowstorchaudio 2.9.1defaults totorchcodecfor all audio loading/saving, buttorchcodec's DLLs aren't compatible with the ROCm PyTorch build. This breaks both audio export (MP3/FLAC saving) and dataset preprocessing for LoRA training.For audio export: Set output format to WAV in the Gradio UI audio settings. MP3/FLAC export through torchcodec will fail.
For LoRA training preprocessing: Replace the audio loading function to use
soundfileinstead oftorchaudio.File:
ACE-Step-1.5\acestep\training\dataset_builder_modules\preprocess_audio.pyReplace entire contents with:
Performance Numbers
First generation is slow (~10-15 min) because MIOpen needs to compile and cache GPU kernels. Every generation after that uses the cached kernels and is dramatically faster.
Notes
ACESTEP_ROCM_DTYPE=float16env var is supposed to reduce VRAM usage, but the model still loads at float32 regardless. Batch size of 1 is stable. Batch size of 4 caused an OOM crash (0xC0000005) at float32.IsEnoughWorkspace) spam the terminal but are harmless — the computations still complete.torchcodeccan't find ffmpeg DLLs compatible with the ROCm PyTorch build.nano-vllmisn't installed but it falls back gracefully).AMD Radeon RX 7900 XT (20.0 GB, HIP 7.2.26024-f6f897bd3d).TL;DR
ACE-Step 1.5 runs on AMD GPUs on Windows via ROCm 7.2. Four fixes needed: patch a distributed import in
vector_quantize_pytorch, uninstalltorchao, launch directly withpy acestep\acestep_v15_pipeline.py, and replace the audio loader inpreprocess_audio.pywith soundfile for LoRA training. Generation goes from ~2.5 min on CPU to under 1 minute on GPU. RX 7000 series cards with 20 GB VRAM hit tier6a (top tier config).Beta Was this translation helpful? Give feedback.
All reactions