Hi, thanks for great work! I've been experimenting with it and ran into a VRAM constraint.
The transformer is ~38 GB in bf16, which exceeds the 24 GB VRAM of common GPUs like the RTX A5000/3090/4090. A community fp8 quantization exists (https://huggingface.co/1038lab/Qwen-Image-Edit-2511-FP8) which brings it down to ~20 GB, potentially fitting within 24 GB.
A few questions:
- Have you tested PixelSmile's LoRA weights with the fp8-quantized base model? Any noticeable quality degradation?
- Are you planning to release an officially validated quantized version (fp8/int8) of the base model or LoRA?
- Is there a recommended workaround for sub-24 GB inference in the meantime?
Thanks!
Hi, thanks for great work! I've been experimenting with it and ran into a VRAM constraint.
The transformer is ~38 GB in bf16, which exceeds the 24 GB VRAM of common GPUs like the RTX A5000/3090/4090. A community fp8 quantization exists (https://huggingface.co/1038lab/Qwen-Image-Edit-2511-FP8) which brings it down to ~20 GB, potentially fitting within 24 GB.
A few questions:
Thanks!