-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
Line 194 in e307d9f
| rotate_ov_proj(layer, model_type, num_heads, head_dim) |
Thanks for your great work!
In the original quarot paper, rotate_ov_proj is designed to rotate the v state, so the (4bit) quantization for the value cache can get easier. But in this repo, I think we only want to rotate the weight and the linear output, we do not want to quant the v cache. So, is this necessary to have this rotate_ov_proj here? If my understanding is incorrect, please point it out. I am looking forward to your reply. Thanks!
Metadata
Metadata
Assignees
Labels
No labels