Replies: 1 comment 1 reply
-
The macros are defined in
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to replace the mixed-precision quantization GEMM CUDA kernel in llama.cpp with my implementation. For this, I must understand the data arrangement and calculation logic in kernel mul_mat_vec_q.
I tried to understand the code by reading it but failed, for a large number of unknown variables and complex parallel calculations.
I want to know how I can understand the code. What do qk, qi, vdr mean?and how the kernel works?
I really feel terrible. Who can help me.
Beta Was this translation helpful? Give feedback.
All reactions