-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MUSA: support ARM64 and enable dp4a .etc #11843
Conversation
Hi @JohannesGaessler , @ggerganov , @slaren , @yeahdongcn , Can you please help review this PR ? Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the CUDA backend look fine to me other than the things I commented on. I don't know whether the changes for model support are correct.
Please run the functionality tests and the tests under the |
Hi @JohannesGaessler , the changes to model support is to enable the |
Hi @yeahdongcn , I see #11822 had been merged. When running
|
FYI,the above |
Hi @yeahdongcn , the model running issue had been fixed on x86,
|
Hi @slaren , the |
Performed some tests on
Model list: ❯ ls -l ~/models/
total 12471020
-rw-r--r-- 1 xiaodongye xiaodongye 4683073184 1月 21 18:43 deepseek-r1_7b_q4_0.gguf
-rw-rw-r-- 1 xiaodongye xiaodongye 1321082688 9月 26 01:19 llama3.2_1b_q8_0.gguf
-rw-rw-r-- 1 xiaodongye xiaodongye 4661211424 5月 21 2024 llama3_8b_q4_0.gguf
-rw-rw-r-- 1 xiaodongye xiaodongye 2104932768 2月 18 10:51 qwen2.5-3b-instruct-q4_k_m.gguf
❯ ./test-backend-ops
...
FLASH_ATTN_EXT(hs=256,nh=32,kv=1024,nb=35,mask=0,max_bias=0.000000,logit_softcap=0.000000,type_KV=q8_0,permute=[0,1,2,3]): not supported [MUSA0]
FLASH_ATTN_EXT(hs=256,nh=32,kv=1024,nb=35,mask=0,max_bias=0.000000,logit_softcap=0.000000,type_KV=q4_0,permute=[0,1,2,3]): not supported [MUSA0]
CROSS_ENTROPY_LOSS(type=f32,ne=[10,5,4,3]): MUSA error: invalid argument
current device: 0, in function ggml_cuda_cross_entropy_loss at /home/xiaodongye/ws/ggml/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu:129
musaFuncSetAttribute(cross_entropy_loss_f32<true>, musaFuncAttributeMaxDynamicSharedMemorySize, smpbo)
/home/xiaodongye/ws/ggml/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:73: MUSA error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run. @slaren Could you please approve the workflow so I can verify if this PR works on other backends? Thanks! |
diff --git a/ggml/src/ggml-cuda/cross-entropy-loss.cu b/ggml/src/ggml-cuda/cross-entropy-loss.cu
index 223576b2..0ce4afbb 100644
--- a/ggml/src/ggml-cuda/cross-entropy-loss.cu
+++ b/ggml/src/ggml-cuda/cross-entropy-loss.cu
@@ -123,13 +123,13 @@ void ggml_cuda_cross_entropy_loss(ggml_backend_cuda_context & ctx, ggml_tensor *
ggml_cuda_pool_alloc<float> dst_tmp(pool, blocks_num.x);
if (nbytes_shared <= smpbo) {
-#if !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__))
+#if !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__)) && !defined(GGML_USE_MUSA)
static bool shared_memory_limit_raised[GGML_CUDA_MAX_DEVICES] = {false};
if (!shared_memory_limit_raised[id]) {
CUDA_CHECK(cudaFuncSetAttribute(cross_entropy_loss_f32<true>, cudaFuncAttributeMaxDynamicSharedMemorySize, smpbo));
shared_memory_limit_raised[id] = true;
}
-#endif // !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__))
+#endif // !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__)) && !defined(GGML_USE_MUSA)
cross_entropy_loss_f32<true><<<blocks_num, blocks_dim, nbytes_shared, stream>>>(src0_d, src1_d, dst_tmp.ptr, ne00, nrows);
} else {
cross_entropy_loss_f32<false><<<blocks_num, blocks_dim, 0, stream>>>(src0_d, src1_d, dst_tmp.ptr, ne00, nrows);
@@ -175,13 +175,13 @@ void ggml_cuda_cross_entropy_loss_back(ggml_backend_cuda_context & ctx, ggml_ten
const size_t smpbo = ggml_cuda_info().devices[id].smpbo;
if (nbytes_shared <= smpbo) {
-#if !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__))
+#if !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__)) && !defined(GGML_USE_MUSA)
static bool shared_memory_limit_raised[GGML_CUDA_MAX_DEVICES] = {false};
if (!shared_memory_limit_raised[id]) {
CUDA_CHECK(cudaFuncSetAttribute(cross_entropy_loss_back_f32<true>, cudaFuncAttributeMaxDynamicSharedMemorySize, smpbo));
shared_memory_limit_raised[id] = true;
}
-#endif // !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__))
+#endif // !(defined(GGML_USE_HIP) && defined(__HIP_PLATFORM_AMD__)) && !defined(GGML_USE_MUSA)
cross_entropy_loss_back_f32<true><<<blocks_num, blocks_dim, nbytes_shared, stream>>>(grad_d, src0f_d, src1f_d, dst_d, ne00);
} else {
cross_entropy_loss_back_f32<false><<<blocks_num, blocks_dim, 0, stream>>>(grad_d, src0f_d, src1f_d, dst_d, ne00); It seems we need to disable them similarly to
|
Hi @yeahdongcn , as we aligned, I had applied the above code changes. |
Hi @JohannesGaessler , @slaren , @yeahdongcn , Now all the OP tests have passed, could you please help merge this PR ?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ggerganov ,@JohannesGaessler , Thanks a lot. |
Yes, we can merge after the CI workflows are passed. I just started them. |
This PR will:
dp4a
on MUSA;Build:
Example run:
Tested with following models:
ARM64:
x86: