Skip to content

FP8 compatibility / quantized model weights for 24 GB GPUs? #4

@jeolpyeoni

Description

@jeolpyeoni

Hi, thanks for great work! I've been experimenting with it and ran into a VRAM constraint.

The transformer is ~38 GB in bf16, which exceeds the 24 GB VRAM of common GPUs like the RTX A5000/3090/4090. A community fp8 quantization exists (https://huggingface.co/1038lab/Qwen-Image-Edit-2511-FP8) which brings it down to ~20 GB, potentially fitting within 24 GB.

A few questions:

  1. Have you tested PixelSmile's LoRA weights with the fp8-quantized base model? Any noticeable quality degradation?
  2. Are you planning to release an officially validated quantized version (fp8/int8) of the base model or LoRA?
  3. Is there a recommended workaround for sub-24 GB inference in the meantime?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions