I'm getting very frequent crashes on Arch Linux, using an AMD Radeon 9070XT, regardless of model and settings, always while doing VAE.
[Server] Job 0b85561d79a2fd8a created (1 requests)
[Server] Loading synth: DiT=acestep-v15-xl-turbo-Q8_0.gguf
[GGUF] ./models/acestep-v15-xl-turbo-Q8_0.gguf: 830 tensors, data at offset 69088
[GGUF] ./models/acestep-v15-xl-turbo-Q8_0.gguf: 830 tensors, data at offset 69088
[Synth-Load] Ready: turbo=yes, fa=yes, batch_cfg=yes
[Resolve-Params] max audio codes across batch: 1315 (263.0s @ 5Hz)
[Resolve-T] T=6576, S=3288
[Resolve-T] seed=1514441185, steps=8, guidance=1.0, shift=3.0, duration=264.0s
[BPE] Loaded from GGUF: 151643 vocab, 151387 merges
[Load] TextEncoder backend: Vulkan0 (CPU threads: 16)
[GGUF] ./models/Qwen3-Embedding-0.6B-Q8_0.gguf: 310 tensors, data at offset 5337664
[Load] TextEncoder: 28L, H=1024, Nh=16/8
[Qwen3] Attn: Q+K+V fused
[Qwen3] MLP: gate+up fused
[WeightCtx] Loaded 310 tensors, 742.7 MB into backend
[Store] Load TextEnc: 250 ms
[Store] Unload TextEnc (742.7 MB)
[Load] CondEncoder backend: Vulkan0 (CPU threads: 16)
[GGUF] ./models/acestep-v15-xl-turbo-Q8_0.gguf: 830 tensors, data at offset 69088
[Load] LyricEncoder: 8L
[Qwen3] Attn: Q+K+V fused
[Qwen3] MLP: gate+up fused
[Load] TimbreEncoder: 4L
[Qwen3] Attn: Q+K+V fused
[Qwen3] MLP: gate+up fused
[WeightCtx] Loaded 141 tensors, 616.6 MB into backend
[Load] CondEncoder: lyric(8L), timbre(4L, CLS), text_proj, null_cond
[Store] Load CondEnc: 207 ms
[CondEnc] Lyric sliding mask: 262x262, window=128
[CondEnc] Timbre sliding mask: 2x2, window=128 (CLS)
[Encode] Packed: lyric=262 + timbre=1 + text=182 = 445 tokens
[Encode-Text Batch0] 182+262 tokens -> enc_S=445, 36.3 ms
[Store] Unload CondEnc (616.6 MB)
[Load] Detokenizer backend: Vulkan0 (CPU threads: 16)
[GGUF] ./models/acestep-v15-xl-turbo-Q8_0.gguf: 830 tensors, data at offset 69088
[WeightCtx] Loaded 30 tensors, 106.5 MB into backend
[Load] Detokenizer: FSQ(6->2048) + 2L encoder(S=5, 2048->64)
[Store] Load FSQ-Detok: 69 ms
[Context] Decoded: 1315 codes -> 6575 frames (263.0s @ 25Hz)
[Build-Context Batch0] Detokenizer: 561.9 ms, 1315 codes
[Store] Unload FSQ-Detok (106.5 MB)
[Init-Noise Batch0] Philox noise seed=1514441185, [6576, 64]
[Init-Noise] Starting: T=6576, S=3288, enc_S=445, steps=8, batch=1 (cover)
[Load] DiT backend: Vulkan0 (CPU threads: 16)
[GGUF] ./models/acestep-v15-xl-turbo-Q8_0.gguf: 830 tensors, data at offset 69088
[DiT] Self-attn: Q+K+V fused
[DiT] Cross-attn: Q+K+V fused
[DiT] MLP: gate+up fused
[Load] null_condition_emb found (CFG available)
[WeightCtx] Loaded 630 tensors, 4230.2 MB into backend
[Load] DiT: 32 layers, H=2560, Nh=32/8, D=128
[Store] Load DiT: 1341 ms
[DiT] Batch N=1, T=6576, S=3288, enc_S=445
[DiT] Graph: 2441 nodes
[DiT] Step 1/8 t=1.000
[DiT] Step 2/8 t=0.955
[DiT] Step 3/8 t=0.900
[DiT] Step 4/8 t=0.833
[DiT] Step 5/8 t=0.750
[DiT] Step 6/8 t=0.643
[DiT] Step 7/8 t=0.500
[DiT] Step 8/8 t=0.300
[DiT-Generate] Total: 6343.8 ms (6343.8 ms/sample)
[Store] Unload DiT (4230.2 MB)
[GGUF] ./models/vae-BF16.gguf: 365 tensors, data at offset 30048
[Load] VAE backend: Vulkan0 (CPU threads: 16)
[VAE] Backend: Vulkan0, Weight buffer: 161.1 MB
[VAE] Loaded: 5 blocks, upsample=1920x, F32 activations
[Store] Load VAE-Dec: 833 ms
[VAE] Tiled decode: 8 tiles (chunk=1024, overlap=64, stride=896)
[VAE] Graph: 335 nodes, T_latent=960
[VAE] Upsample factor: 1920.00 (expected ~1920)
[VAE] Graph: 335 nodes, T_latent=1024
radv/amdgpu: The CS has been cancelled because the context is lost. This context is innocent.
[New LWP 17144]
[New LWP 17141]
[New LWP 17139]
[New LWP 17094]
[New LWP 17093]
[New LWP 14465]
[New LWP 14464]
[New LWP 14463]
[New LWP 14462]
[New LWP 14461]
[New LWP 14460]
[New LWP 14459]
[New LWP 14458]
[New LWP 14457]
[New LWP 14456]
[New LWP 14455]
[New LWP 14454]
[New LWP 14453]
[New LWP 14452]
[New LWP 14451]
[New LWP 14450]
[New LWP 14449]
[New LWP 14448]
[New LWP 14447]
[New LWP 14446]
[New LWP 14445]
[New LWP 14444]
[New LWP 14443]
[New LWP 14442]
[New LWP 14441]
[New LWP 14440]
[New LWP 14439]
[New LWP 14438]
[New LWP 14437]
[New LWP 14436]
[New LWP 14435]
[New LWP 14434]
[New LWP 14429]
This GDB supports auto-downloading debuginfo from the following URLs:
</dev/null>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007ff7f998ff32 in ?? () from /usr/lib/libc.so.6
#0 0x00007ff7f998ff32 in ?? () from /usr/lib/libc.so.6
#1 0x00007ff7f998439c in ?? () from /usr/lib/libc.so.6
#2 0x00007ff7f99843e4 in ?? () from /usr/lib/libc.so.6
#3 0x00007ff7f9a0cb62 in accept4 () from /usr/lib/libc.so.6
#4 0x00005628a804e3a0 in httplib::Server::listen_internal() ()
#5 0x00005628a7f96658 in main ()
[Inferior 1 (process 14428) detached]
terminate called after throwing an instance of 'vk::DeviceLostError'
what(): vk::Queue::submit: ErrorDeviceLost
./server.sh: line 14: 14428 Aborted (core dumped) ./build/ace-server --host 0.0.0.0 --port 8085 --models ./models --adapters ./adapters --max-batch 1
I'm getting very frequent crashes on Arch Linux, using an AMD Radeon 9070XT, regardless of model and settings, always while doing VAE.
Here's a log if it can help: