kleidiai: add support for get_rows #14676

chaxu01 · 2025-07-14T11:36:26Z

This patch adds support for KleidiAI acceleration of the Q4_0 matrix multiplication operation in cases where the weight tensor is shared with the get_rows operator. A typical use case is in whisper.cpp, where such weight sharing occurs between get_rows and matmul.

ggerganov · 2025-07-14T13:10:05Z

ggml/src/ggml-cpu/kleidiai/kernels.cpp

+static inline float compute_fp16_to_fp32(ggml_fp16_t h) {
+    static_assert(sizeof(ggml_fp16_t) == sizeof(__fp16), "ggml_fp16_t and __fp16 must be the same size");
+    __fp16 tmp;
+    memcpy(&tmp, &h, sizeof(ggml_fp16_t));
+    return (float)tmp;
+}


Can't we use ggml_fp16_to_fp32() instead introducing this function?

Yes, good point — I'll update the patch to use ggml_fp16_to_fp32() instead.

Since this is in the CPU backend, it could also use the potentially more efficient ggml_cpu_fp16_to_fp32.

ggerganov

General advice is to try to keep the implementation more generic - it seems to focus a lot on Q4_0. Adding more asserts for the current underlying assumptions will help long term in case we add support for other types.

Another important thing we should improve soon is to add support for testing extra buffer types in test-backend-ops (see ggml-org/whisper.cpp#3223 (comment)). Without such tests it is very difficult to verify that these changes do not break something.

ggerganov · 2025-07-14T14:05:58Z

ggml/src/ggml-cpu/kleidiai/kleidiai.cpp

+    bool compute_forward_get_rows(struct ggml_compute_params * params, struct ggml_tensor * dst) {
+        GGML_ASSERT(ctx.kernels);
+
+        const ggml_tensor * src0 = dst->src[0];
+        const ggml_tensor * src1 = dst->src[1];


It would be useful to add a comment or assert that this function currently only works with Q4_0 and nothing else. Otherwise, somebody can get confused and try to call it for other types.

ggerganov · 2025-07-14T14:10:09Z

ggml/src/ggml-cpu/kleidiai/kleidiai.cpp

+            int32_t row_idx = ((const int32_t *)src1->data)[i];
+            GGML_ASSERT(row_idx >= 0 && row_idx < (int32_t)src0->ne[1]);


This is slightly more future-proof version:

Suggested change

int32_t row_idx = ((const int32_t *)src1->data)[i];

GGML_ASSERT(row_idx >= 0 && row_idx < (int32_t)src0->ne[1]);

GGML_ASSERT(src1->type == GGML_TYPE_I32);

int64_t row_idx = ((const int32_t *)src1->data)[i];

GGML_ASSERT(row_idx >= 0 && row_idx < src0->ne[1]);

At some point in the future we might consider changing the indices of ggml_get_rows() to become I64 so this assert will be helpful.

ggerganov · 2025-07-14T14:11:15Z

ggml/src/ggml-cpu/kleidiai/kernels.h

@@ -93,3 +96,4 @@ struct ggml_kleidiai_kernels {

 ggml_kleidiai_kernels * ggml_kleidiai_select_kernels(cpu_feature cpu_features, const ggml_tensor * tensor);
 ggml_kleidiai_kernels * ggml_kleidiai_select_kernels_q4_0(cpu_feature features);
+const char* cpu_feature_to_string(cpu_feature features);


Suggested change

const char* cpu_feature_to_string(cpu_feature features);

static const char* cpu_feature_to_string(cpu_feature features);

ggerganov · 2025-07-14T14:12:47Z

ggml/src/ggml-cpu/kleidiai/kleidiai.cpp

+    size_t nr      = ctx.kernels->gemm.get_nr();
+    size_t kr      = ctx.kernels->gemm.get_kr();


These can also be const

ggerganov · 2025-07-14T14:15:36Z

ggml/src/ggml-cpu/kleidiai/kleidiai.cpp

+    size_t nr      = ctx.kernels->gemm.get_nr();
+    size_t kr      = ctx.kernels->gemm.get_kr();
+
+    return variant_call<size_t>(ctx.kernels->rhs_info.packed_size, n, k, nr, kr, QK4_0);


It would be better to avoid this hardcoded QK4_0 here. Either assert the tensor type is Q4_0 or use ggml_blck_size().

kleidiai: add support for get_rows

3f86ed8

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 14, 2025

chaxu01 mentioned this pull request Jul 14, 2025

whisper: validate get_rows support for cpu extra buffer ggml-org/whisper.cpp#3323

Merged

ggerganov reviewed Jul 14, 2025

View reviewed changes

ggerganov approved these changes Jul 14, 2025

View reviewed changes

chaxu01 added 2 commits July 15, 2025 11:05

apply fixes based on code review

d1a9a5c

apply more fixes based on code review

cecf3af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kleidiai: add support for get_rows #14676

kleidiai: add support for get_rows #14676

chaxu01 commented Jul 14, 2025

Uh oh!

ggerganov Jul 14, 2025

Uh oh!

chaxu01 Jul 14, 2025

Uh oh!

slaren Jul 14, 2025

Uh oh!

ggerganov left a comment

Uh oh!

ggerganov Jul 14, 2025

Uh oh!

ggerganov Jul 14, 2025

Uh oh!

ggerganov Jul 14, 2025

Uh oh!

ggerganov Jul 14, 2025

Uh oh!

ggerganov Jul 14, 2025

Uh oh!

Uh oh!

		int32_t row_idx = ((const int32_t *)src1->data)[i];
		GGML_ASSERT(row_idx >= 0 && row_idx < (int32_t)src0->ne[1]);

-            int32_t row_idx = ((const int32_t *)src1->data)[i];
-            GGML_ASSERT(row_idx >= 0 && row_idx < (int32_t)src0->ne[1]);
+            GGML_ASSERT(src1->type == GGML_TYPE_I32);
+            int64_t row_idx = ((const int32_t *)src1->data)[i];
+            GGML_ASSERT(row_idx >= 0 && row_idx < src0->ne[1]);

	const char* cpu_feature_to_string(cpu_feature features);
	static const char* cpu_feature_to_string(cpu_feature features);

		size_t nr = ctx.kernels->gemm.get_nr();
		size_t kr = ctx.kernels->gemm.get_kr();

kleidiai: add support for get_rows #14676

Are you sure you want to change the base?

kleidiai: add support for get_rows #14676

Conversation

chaxu01 commented Jul 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!