Alternative quantization for models #43
Replies: 5 comments 6 replies
-
|
Hello! The flow matching DiT is so effective that even at Q4_K_M, the difference in rendering compared to BF16 is imperceptible! I could even have suggested higher quantization levels. But since the small LLM is sensitive with its 5Hz audio codes, I stopped at Q5_K_M because below that, the results start to hallucinate. |
Beta Was this translation helpful? Give feedback.
-
Random generation with a small visible difference on the amplitude graph in the final rendering at Q4, exactly like in diffusion for images :) it does not affect the quality, it is just a rounding error which produces a different choice of DiT. BF16_3519225289.mp3 Request : |
Beta Was this translation helpful? Give feedback.
-
|
Another example cherry-picked for largest divergence, even on Q8_0 vs BF16. BF16.json |
Beta Was this translation helpful? Give feedback.
-
|
ggml-org/llama.cpp#4782 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
@ServeurpersoCom
Hello. I've been wondering if it makes sense to try alternative quantization methods for models (especially for DiT model)? I'm new to this, but maybe something like HQQ would be better, have you considered it yet?
Beta Was this translation helpful? Give feedback.
All reactions