Choosing quant by tensor pattern with the help of regex like llama.cpp #723

Greatz08 · 2025-07-06T13:21:22Z

Greatz08
Jul 6, 2025

FWIW, I've been working on an option for sd.cpp to choose the quant by tensor pattern, a la llama.cpp's overridetensors: e128cfa . The conversion itself already works; could be useful for testing.

@wbruna This was your message on #696 issue, so i wanted to know if you were able to finish your work on this feature. I recently experienced the power of using --override-tensor in llama.cpp which helped me load bigger model easily and i was able to improve performance of inference, so i was interested to see if it was possible for image generation models like flux, so i searched for it and got to know about this another great project.

I tried to find info regarding similar feature in this project but was not able to find it and then i tried to search in this project issue and got to know that someone was working on it, so was just interested to know if you were able achieve similar thing like llama.cpp with --override-tensor in this project :-))

wbruna · 2025-07-07T11:07:35Z

wbruna
Jul 7, 2025

The convert functionality is already working. It's still missing weight quantization on model loading, but I guess the file conversion would be more useful anyway. I'll send a PR later today.

0 replies

wbruna · 2025-07-07T13:30:52Z

wbruna
Jul 7, 2025

Done: #724

Note this uses a similar syntax to --override-tensor, but the functionality is different: it can apply more aggressive quantization types on less sensitive model parts, which helps reducing memory usage without excessive quality degradation; while --override-tensor allows choosing which device runs which tensor at runtime.

5 replies

Greatz08 Jul 8, 2025
Author

@wbruna If you can provide guidance on how to use this feature for flux model, then it will be great as i personally dont have any experience with quantization of image to text model. I dont know what are "less sensitive model parts", which can be quantized aggressively without any loss of quality and how much difference can that actually make for us.

Btw thanks for your valuable contribution :-))

Greatz08 Jul 8, 2025
Author

I do have 8GB VRAM only, and as we have to load more than just a model, so i dont know if this will be valuable thing for me or people like me, But i guess it is still valuable addition to this project. Wish there was an exact way to load specific tensor layers like --override-tensor to CPU/RAM, and other attention based layers for GPU, so that we could utilize all our resources smartly and load better quantized model with better speed.

Green-Sky Jul 8, 2025

@Greatz08 selecting for the layers removed in this purned model https://huggingface.co/Freepik/flux.1-lite-8B would be a good start (and/or look at their layer importance graphs)

((transformer\.transformer_blocks\.)(\d+)(.*) (python) layers 5 to 15 got removed)

Green-Sky Jul 8, 2025

It would also be good to know which layers they replaced in Chroma, and assign those a lower quant too.

stduhpf Jul 8, 2025

They basically replaced anything related to the modulation (txt_mod, img_mod in double blocks, modulation in single blocks). Plus some other inputs-related stuff like time_in(probably important for Flux), vector_in (probably not important) and guidance_in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Choosing quant by tensor pattern with the help of regex like llama.cpp #723

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Choosing quant by tensor pattern with the help of regex like llama.cpp #723

Uh oh!

Greatz08 Jul 6, 2025

Replies: 2 comments · 5 replies

Uh oh!

wbruna Jul 7, 2025

Uh oh!

wbruna Jul 7, 2025

Uh oh!

Greatz08 Jul 8, 2025 Author

Uh oh!

Greatz08 Jul 8, 2025 Author

Uh oh!

Green-Sky Jul 8, 2025

Uh oh!

Green-Sky Jul 8, 2025

Uh oh!

stduhpf Jul 8, 2025

Greatz08
Jul 6, 2025

Replies: 2 comments 5 replies

wbruna
Jul 7, 2025

wbruna
Jul 7, 2025

Greatz08 Jul 8, 2025
Author

Greatz08 Jul 8, 2025
Author