Saves Quantized Model to disk for loading next time via "--int8" by petermg · Pull Request #61 · bytedance/DreamO

petermg · 2025-05-24T07:49:15Z

saves quantized model to disk when using "python app.py --int8", then on next load it will load the quantized model from the disk so that it no longer has to quantize it every launch. You still need to use "--int8" when running to tell it to use the quantized version.
Also opened up a few other options in the UI.

saves quantized model to disk when using "python app.py --int8", then on next load it will load the quantized model from the disk so that it no longer has to quantize it every launch. You still need to use "--int8" when running to tell it to use the quantized version.

Modified from original: Saves quantized models to disk Loads quantized models from disk if found, so no need to quantize every run. support for LoRAs Added the ability to specify number of images to generate per run. Exposed and/or added the following options in the UI: "Face Upscale Factor", "Face Crop Size", "resolution for ref image", "Neg Prompt", and some others that were previously hidden in the "Advanced Options" accordion.

raisindetre · 2025-06-15T01:27:40Z

I gave this a go on a hosted VM but the code still beligerantly tries to download full Flux.1.Dev models from HuggingFace. I tried to remove the .cache directory from models and even commented out the code in the conditional to download the models if the quantized ones aren't found but nothing worked. Downloading the full size files just starves the storage disk and I don't want to rent a huge disk just to store a huge model.

I tried to use a pretrained int8 model using
wget https://huggingface.co/Disty0/FLUX.1-dev-qint8/blob/main/transformer/diffusion_pytorch_model.safetensors mv diffusion_pytorch_model.safetensors quantized_transformer.pt

Update app.py

e3380d3

saves quantized model to disk when using "python app.py --int8", then on next load it will load the quantized model from the disk so that it no longer has to quantize it every launch. You still need to use "--int8" when running to tell it to use the quantized version.

petermg mentioned this pull request May 24, 2025

Launch repeated quantization model? --int8 #56

Open

This was referenced May 24, 2025

Amazing tool, would be possible to add a queue? #64

Open

Can we use a well quantified model? Why do we need to quantify it once in the code? #38

Open

petermg added 3 commits May 24, 2025 16:53

Update requirements.txt

90b5313

Update app.py

f7b5f27

Update app.py

05fdda6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saves Quantized Model to disk for loading next time via "--int8"#61

Saves Quantized Model to disk for loading next time via "--int8"#61
petermg wants to merge 5 commits intobytedance:mainfrom
petermg:main

petermg commented May 24, 2025

Uh oh!

raisindetre commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

petermg commented May 24, 2025

Uh oh!

raisindetre commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants