ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) - Working Setup + Fixes #404

tuckeranglemyer-pixel · 2026-04-05T07:04:41Z

tuckeranglemyer-pixel
Apr 5, 2026

ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) — Working Setup + Fixes

Got ACE-Step 1.5 running on an AMD GPU on Windows with ROCm 7.2. Documenting the full setup and the four fixes required since this doesn't work out of the box. Hopefully this saves someone else a few hours.

Hardware & Environment

GPU: AMD Radeon RX 7900 XT (20 GB VRAM)
CPU: AMD Ryzen 9 7900X3D
RAM: 32 GB
OS: Windows 11
Python: 3.12.10
Detected Tier: tier6a (top tier, max batch size 4 with LM, 8 without)

Installation Steps

1. Clone the repo

git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5

2. Install ROCm SDK components

pip install --no-cache-dir ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_core-7.2.0.dev0-py3-none-win_amd64.whl ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_devel-7.2.0.dev0-py3-none-win_amd64.whl ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_libraries_custom-7.2.0.dev0-py3-none-win_amd64.whl ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm-7.2.0.dev0.tar.gz

3. Install PyTorch for ROCm

pip install --no-cache-dir ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torch-2.9.1+rocmsdk20260116-cp312-cp312-win_amd64.whl ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchaudio-2.9.1+rocmsdk20260116-cp312-cp312-win_amd64.whl ^
  https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchvision-0.24.1+rocmsdk20260116-cp312-cp312-win_amd64.whl

4. Install ROCm-compatible dependencies

pip install -r requirements-rocm.txt

5. Verify GPU detection

py -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

Expected output:

2.9.1+rocmsdk20260116
True

Required Fixes (4 total)

Fix 1: `vector_quantize_pytorch` — `torch.distributed.group` import error

ROCm on Windows doesn't ship torch.distributed with the full distributed training backend. The vector_quantize_pytorch package imports it unconditionally and crashes.

File: C:\Users\<USER>\AppData\Local\Programs\Python\Python312\Lib\site-packages\vector_quantize_pytorch\lookup_free_quantization.py

Find (lines 14-15):

import torch.distributed as dist
from torch.distributed import nn as dist_nn

Replace with:

try:
    import torch.distributed as dist
    from torch.distributed import nn as dist_nn
except (ImportError, AttributeError):
    import types
    dist = types.SimpleNamespace(
        is_initialized=lambda: False,
        get_world_size=lambda: 1,
    )
    dist_nn = None

Fix 2: `torchao` — incompatible with ROCm Windows

torchao tries to register distributed ops (_c10d_functional.all_gather_into_tensor) that don't exist in the ROCm Windows build. The requirements-rocm.txt already excludes it, but if you installed from requirements.txt first, it's still there.

Fix:

pip uninstall torchao -y

Note: This disables quantization features (int8/int4 weight quantization). The model runs fine at full precision — you have 20 GB of VRAM.

Fix 3: Launch command — bypass the batch script

start_gradio_ui_rocm.bat looks for a venv_rocm virtual environment that doesn't exist when you install to system Python. It will refuse to launch even though everything is correctly installed.

Instead of:

.\start_gradio_ui_rocm.bat

Run directly:

py acestep\acestep_v15_pipeline.py

Fix 4: `torchaudio` defaults to `torchcodec` which is incompatible with ROCm Windows

torchaudio 2.9.1 defaults to torchcodec for all audio loading/saving, but torchcodec's DLLs aren't compatible with the ROCm PyTorch build. This breaks both audio export (MP3/FLAC saving) and dataset preprocessing for LoRA training.

For audio export: Set output format to WAV in the Gradio UI audio settings. MP3/FLAC export through torchcodec will fail.

For LoRA training preprocessing: Replace the audio loading function to use soundfile instead of torchaudio.

File: ACE-Step-1.5\acestep\training\dataset_builder_modules\preprocess_audio.py

Replace entire contents with:

import soundfile as sf
import torch
import numpy as np
def load_audio_stereo(audio_path, target_sample_rate=48000, max_duration=240.0):

audio, sr = sf.read(audio_path, dtype='float32')

audio = torch.from_numpy(audio)

if audio.dim() == 1:

audio = audio.unsqueeze(0)

else:

audio = audio.T

if audio.shape[0] == 1:

audio = audio.repeat(2, 1)

if sr != target_sample_rate:

import torchaudio.functional as F

audio = F.resample(audio, sr, target_sample_rate)

max_samples = int(max_duration * target_sample_rate)

if audio.shape[1] > max_samples:

audio = audio[:, :max_samples]

return audio, target_sample_rate

Performance Numbers

Metric	CPU (Ryzen 9 7900X3D)	GPU (RX 7900 XT, ROCm)
DiT diffusion (30s track)	~62 seconds	0.8–1.1 seconds
VAE decode (30s track)	~80 seconds	18–41 seconds
Total generation (30s, batch=1)	~2.5 minutes	~30–60 seconds

First generation is slow (~10-15 min) because MIOpen needs to compile and cache GPU kernels. Every generation after that uses the cached kernels and is dramatically faster.

Notes

The ACESTEP_ROCM_DTYPE=float16 env var is supposed to reduce VRAM usage, but the model still loads at float32 regardless. Batch size of 1 is stable. Batch size of 4 caused an OOM crash (0xC0000005) at float32.
MIOpen workspace warnings (IsEnoughWorkspace) spam the terminal but are harmless — the computations still complete.
Audio output format: set to WAV in the UI settings. MP3 export fails because torchcodec can't find ffmpeg DLLs compatible with the ROCm PyTorch build.
The 5Hz LM works on the PyTorch backend (nano-vllm isn't installed but it falls back gracefully).
GPU is detected as AMD Radeon RX 7900 XT (20.0 GB, HIP 7.2.26024-f6f897bd3d).

TL;DR

ACE-Step 1.5 runs on AMD GPUs on Windows via ROCm 7.2. Four fixes needed: patch a distributed import in vector_quantize_pytorch, uninstall torchao, launch directly with py acestep\acestep_v15_pipeline.py, and replace the audio loader in preprocess_audio.py with soundfile for LoRA training. Generation goes from ~2.5 min on CPU to under 1 minute on GPU. RX 7000 series cards with 20 GB VRAM hit tier6a (top tier config).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) - Working Setup + Fixes #404

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) - Working Setup + Fixes #404

Uh oh!

Uh oh!

tuckeranglemyer-pixel Apr 5, 2026

ACE-Step 1.5 on AMD RX 7900 XT (ROCm 7.2, Windows 11) — Working Setup + Fixes

Hardware & Environment

Installation Steps

1. Clone the repo

2. Install ROCm SDK components

3. Install PyTorch for ROCm

4. Install ROCm-compatible dependencies

5. Verify GPU detection

Required Fixes (4 total)

Fix 1: vector_quantize_pytorch — torch.distributed.group import error

Fix 2: torchao — incompatible with ROCm Windows

Fix 3: Launch command — bypass the batch script

Fix 4: torchaudio defaults to torchcodec which is incompatible with ROCm Windows

Performance Numbers

Notes

TL;DR

Replies: 0 comments

tuckeranglemyer-pixel
Apr 5, 2026

Fix 1: `vector_quantize_pytorch` — `torch.distributed.group` import error

Fix 2: `torchao` — incompatible with ROCm Windows

Fix 4: `torchaudio` defaults to `torchcodec` which is incompatible with ROCm Windows