llamafile_sgemm API says it may compute C = A^T x B . When is A matrix transposed in llama.cpp ? #14162
Unanswered
shalinisalom99
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to know here if A is weights matrix and B is the activations matrix ? Also, is the transposition for A Matrix is done offline during PyTorch -> gguf conversion ? If so, is this the piece of code which does it ?
llama.cpp/convert_hf_to_gguf.py
Line 3095 in 7d51644
If not, can you please point at what phase in runtime does this transposition happen ?
Beta Was this translation helpful? Give feedback.
All reactions