get_rows & dequantize function implementation for repacked weights of type q4_K (q4_Kx8) #3291

swetha097 · 2025-06-26T14:24:19Z

This implements the GGML_OP_GET_ROWS operation specifically for repacked (block interleaved) 4-bit quantized format (q4_Kx8)
The following gains were observed by the changes made in the PR - The changes allow for increased usage of the GEMM function (ggml_gemm_q4_K_8x8_q8_K) for q4_K type

master branch commit - 17bece1885047dcf338c557a49f393677e265a91

development (get_rows) branch commit - 70cf05ae71fbabb1f943e13e4d27f41f890197f7

The PR was tested in AMD Raphael 7600X which supports the following flags :

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | REPACK = 1 |

Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q4_0

…ghts of type q4_0

…r.cpp into swe_pr/get_rows

…cked weights of type q4_K

…ghts of type q4_0

…cked weights of type q4_K

…hisper.cpp into swe_pr/get_rows_q4_K

swetha097 added 16 commits June 3, 2025 02:56

Get_Rows & Dequantize implementation adapted to work for repacked wei…

d6cc466

…ghts of type q4_0

Resolve PR comments

994e02a

Get_Rows & Dequantize implementation adapted to work for repacked wei…

ed1d3a2

…ghts of type q4_0

Resolve PR comments

6959d41

Merge branch 'swe_pr/get_rows' of https://github.com/swetha097/whispe…

1a79d18

…r.cpp into swe_pr/get_rows

Merge branch 'master' into swe_pr/get_rows

1e72e4b

Add the Get_Rows & Dequantize implementation adapted to work for repa…

066b47a

…cked weights of type q4_K

Get_Rows & Dequantize implementation adapted to work for repacked wei…

b9e152d

…ghts of type q4_0

Resolve PR comments

2705c08

Add the Get_Rows & Dequantize implementation adapted to work for repa…

70cf05a

…cked weights of type q4_K

Merge branch 'swe_pr/get_rows_q4_K' of https://github.com/swetha097/w…

e53bb02

…hisper.cpp into swe_pr/get_rows_q4_K

Remove q4_0 code implementation for get_rows & dequantize

1446e64

Fix warning

099aa24

Resolve minor PR comments

37303ab

Merge branch 'master' into swe_pr/q4_K_get_rows

a4e2602

Merge remote-tracking branch 'origin/master' into swe_pr/q4_K_get_rows

74bc3c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

get_rows & dequantize function implementation for repacked weights of type q4_K (q4_Kx8) #3291

get_rows & dequantize function implementation for repacked weights of type q4_K (q4_Kx8) #3291

swetha097 commented Jun 26, 2025

Uh oh!

Uh oh!

get_rows & dequantize function implementation for repacked weights of type q4_K (q4_Kx8) #3291

Are you sure you want to change the base?

get_rows & dequantize function implementation for repacked weights of type q4_K (q4_Kx8) #3291

Conversation

swetha097 commented Jun 26, 2025

Uh oh!

Uh oh!