llamafile_sgemm API says it may compute C = A^T x B . When is A matrix transposed in llama.cpp ? #14162

shalinisalom99 · 2025-06-13T06:10:20Z

shalinisalom99
Jun 13, 2025

I would like to know here if A is weights matrix and B is the activations matrix ? Also, is the transposition for A Matrix is done offline during PyTorch -> gguf conversion ? If so, is this the piece of code which does it ?

llama.cpp/convert_hf_to_gguf.py

Line 3095 in 7d51644

    
           if name.endswith((".c_attn.weight", ".c_proj.weight", ".c_fc.weight", ".c_proj.weight")):

If not, can you please point at what phase in runtime does this transposition happen ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llamafile_sgemm API says it may compute C = A^T x B . When is A matrix transposed in llama.cpp ? #14162

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

llamafile_sgemm API says it may compute C = A^T x B . When is A matrix transposed in llama.cpp ? #14162

Uh oh!

shalinisalom99 Jun 13, 2025

Replies: 0 comments

shalinisalom99
Jun 13, 2025