Replies: 1 comment 5 replies
-
uma mode uses hip runtime managed memory, runtime managed memory is only efficient when the device is in xnack+ mode. That said as the flag name suggests, using managed memory never makes any sense for llamacpp on a device with dedicated video memory |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am trying out LLMs with multiple AMD MI50 cards. I did some compiles and try to compare the performance. I encountered a weird or unexpected behaviour.
First with a compile using ROCM, and different offloaded layers, it looks roughly as expected, increasing performance with increased number of offloaded layers:
bin/llama-bench -t 24 -m ~/models/DeepSeek-R1-Distill-Llama-8B-Q8_0/DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf -ngl 0,10,20,30,40,50,60,70 -ts 1/1/1/1 --main-gpu 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Then, compiling with
GGML_HIP_UMA=1
, I get the following, increasing pp512 with increased layers, but decreasing tg128:bin/llama-bench -t 24 -m ~/models/DeepSeek-R1-Distill-Llama-8B-Q8_0/DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf -ngl 0,10,20,30,40,50,60,70 -ts 1/1/1/1 --main-gpu 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 ROCm devices:
Device 0: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI50/MI60, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Is this effect to be expected? It looks weird.
Beta Was this translation helpful? Give feedback.
All reactions