Skip to content

get_rows & dequantize function implementation for repacked weights of type q4_K (q4_Kx8) #3291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

swetha097
Copy link

  • This implements the GGML_OP_GET_ROWS operation specifically for repacked (block interleaved) 4-bit quantized format (q4_Kx8)

  • The following gains were observed by the changes made in the PR - The changes allow for increased usage of the GEMM function (ggml_gemm_q4_K_8x8_q8_K) for q4_K type

image

master branch commit - 17bece1885047dcf338c557a49f393677e265a91

development (get_rows) branch commit - 70cf05ae71fbabb1f943e13e4d27f41f890197f7

image

The PR was tested in AMD Raphael 7600X which supports the following flags :

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | REPACK = 1 |

Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q4_0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant