Skip to content

Commit 073de95

Browse files
authored
Merge pull request #203 from huggingface/main
Merge changes
2 parents 12981c5 + 733b44a commit 073de95

File tree

86 files changed

+7394
-769
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+7394
-769
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -418,6 +418,8 @@ jobs:
418418
test_location: "gguf"
419419
- backend: "torchao"
420420
test_location: "torchao"
421+
- backend: "optimum_quanto"
422+
test_location: "quanto"
421423
runs-on:
422424
group: aws-g6e-xlarge-plus
423425
container:

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,8 @@
8181
title: Overview
8282
- local: hybrid_inference/vae_decode
8383
title: VAE Decode
84+
- local: hybrid_inference/vae_encode
85+
title: VAE Encode
8486
- local: hybrid_inference/api_reference
8587
title: API Reference
8688
title: Hybrid Inference
@@ -173,6 +175,8 @@
173175
title: gguf
174176
- local: quantization/torchao
175177
title: torchao
178+
- local: quantization/quanto
179+
title: quanto
176180
title: Quantization Methods
177181
- sections:
178182
- local: optimization/fp16

docs/source/en/api/pipelines/wan.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414

1515
# Wan
1616

17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
1721
[Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
1822

1923
<!-- TODO(aryan): update abstract once paper is out -->

docs/source/en/api/quantization.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,11 @@ Learn how to quantize models in the [Quantization](../quantization/overview) gui
3131
## GGUFQuantizationConfig
3232

3333
[[autodoc]] GGUFQuantizationConfig
34+
35+
## QuantoConfig
36+
37+
[[autodoc]] QuantoConfig
38+
3439
## TorchAoConfig
3540

3641
[[autodoc]] TorchAoConfig

docs/source/en/hybrid_inference/api_reference.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,7 @@
33
## Remote Decode
44

55
[[autodoc]] utils.remote_utils.remote_decode
6+
7+
## Remote Encode
8+
9+
[[autodoc]] utils.remote_utils.remote_encode

docs/source/en/hybrid_inference/overview.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Hybrid Inference offers a fast and simple way to offload local generation requir
3636
## Available Models
3737

3838
* **VAE Decode 🖼️:** Quickly decode latent representations into high-quality images without compromising performance or workflow speed.
39-
* **VAE Encode 🔢 (coming soon):** Efficiently encode images into latent representations for generation and training.
39+
* **VAE Encode 🔢:** Efficiently encode images into latent representations for generation and training.
4040
* **Text Encoders 📃 (coming soon):** Compute text embeddings for your prompts quickly and accurately, ensuring a smooth and high-quality workflow.
4141

4242
---
@@ -46,9 +46,15 @@ Hybrid Inference offers a fast and simple way to offload local generation requir
4646
* **[SD.Next](https://github.com/vladmandic/sdnext):** All-in-one UI with direct supports Hybrid Inference.
4747
* **[ComfyUI-HFRemoteVae](https://github.com/kijai/ComfyUI-HFRemoteVae):** ComfyUI node for Hybrid Inference.
4848

49+
## Changelog
50+
51+
- March 10 2025: Added VAE encode
52+
- March 2 2025: Initial release with VAE decoding
53+
4954
## Contents
5055

51-
The documentation is organized into two sections:
56+
The documentation is organized into three sections:
5257

5358
* **VAE Decode** Learn the basics of how to use VAE Decode with Hybrid Inference.
59+
* **VAE Encode** Learn the basics of how to use VAE Encode with Hybrid Inference.
5460
* **API Reference** Dive into task-specific settings and parameters.
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Getting Started: VAE Encode with Hybrid Inference
2+
3+
VAE encode is used for training, image-to-image and image-to-video - turning into images or videos into latent representations.
4+
5+
## Memory
6+
7+
These tables demonstrate the VRAM requirements for VAE encode with SD v1 and SD XL on different GPUs.
8+
9+
For the majority of these GPUs the memory usage % dictates other models (text encoders, UNet/Transformer) must be offloaded, or tiled encoding has to be used which increases time taken and impacts quality.
10+
11+
<details><summary>SD v1.5</summary>
12+
13+
| GPU | Resolution | Time (seconds) | Memory (%) | Tiled Time (secs) | Tiled Memory (%) |
14+
|:------------------------------|:-------------|-----------------:|-------------:|--------------------:|-------------------:|
15+
| NVIDIA GeForce RTX 4090 | 512x512 | 0.015 | 3.51901 | 0.015 | 3.51901 |
16+
| NVIDIA GeForce RTX 4090 | 256x256 | 0.004 | 1.3154 | 0.005 | 1.3154 |
17+
| NVIDIA GeForce RTX 4090 | 2048x2048 | 0.402 | 47.1852 | 0.496 | 3.51901 |
18+
| NVIDIA GeForce RTX 4090 | 1024x1024 | 0.078 | 12.2658 | 0.094 | 3.51901 |
19+
| NVIDIA GeForce RTX 4080 SUPER | 512x512 | 0.023 | 5.30105 | 0.023 | 5.30105 |
20+
| NVIDIA GeForce RTX 4080 SUPER | 256x256 | 0.006 | 1.98152 | 0.006 | 1.98152 |
21+
| NVIDIA GeForce RTX 4080 SUPER | 2048x2048 | 0.574 | 71.08 | 0.656 | 5.30105 |
22+
| NVIDIA GeForce RTX 4080 SUPER | 1024x1024 | 0.111 | 18.4772 | 0.14 | 5.30105 |
23+
| NVIDIA GeForce RTX 3090 | 512x512 | 0.032 | 3.52782 | 0.032 | 3.52782 |
24+
| NVIDIA GeForce RTX 3090 | 256x256 | 0.01 | 1.31869 | 0.009 | 1.31869 |
25+
| NVIDIA GeForce RTX 3090 | 2048x2048 | 0.742 | 47.3033 | 0.954 | 3.52782 |
26+
| NVIDIA GeForce RTX 3090 | 1024x1024 | 0.136 | 12.2965 | 0.207 | 3.52782 |
27+
| NVIDIA GeForce RTX 3080 | 512x512 | 0.036 | 8.51761 | 0.036 | 8.51761 |
28+
| NVIDIA GeForce RTX 3080 | 256x256 | 0.01 | 3.18387 | 0.01 | 3.18387 |
29+
| NVIDIA GeForce RTX 3080 | 2048x2048 | 0.863 | 86.7424 | 1.191 | 8.51761 |
30+
| NVIDIA GeForce RTX 3080 | 1024x1024 | 0.157 | 29.6888 | 0.227 | 8.51761 |
31+
| NVIDIA GeForce RTX 3070 | 512x512 | 0.051 | 10.6941 | 0.051 | 10.6941 |
32+
| NVIDIA GeForce RTX 3070 | 256x256 | 0.015 | 3.99743 | 0.015 | 3.99743 |
33+
| NVIDIA GeForce RTX 3070 | 2048x2048 | 1.217 | 96.054 | 1.482 | 10.6941 |
34+
| NVIDIA GeForce RTX 3070 | 1024x1024 | 0.223 | 37.2751 | 0.327 | 10.6941 |
35+
36+
37+
</details>
38+
39+
<details><summary>SDXL</summary>
40+
41+
| GPU | Resolution | Time (seconds) | Memory Consumed (%) | Tiled Time (seconds) | Tiled Memory (%) |
42+
|:------------------------------|:-------------|-----------------:|----------------------:|-----------------------:|-------------------:|
43+
| NVIDIA GeForce RTX 4090 | 512x512 | 0.029 | 4.95707 | 0.029 | 4.95707 |
44+
| NVIDIA GeForce RTX 4090 | 256x256 | 0.007 | 2.29666 | 0.007 | 2.29666 |
45+
| NVIDIA GeForce RTX 4090 | 2048x2048 | 0.873 | 66.3452 | 0.863 | 15.5649 |
46+
| NVIDIA GeForce RTX 4090 | 1024x1024 | 0.142 | 15.5479 | 0.143 | 15.5479 |
47+
| NVIDIA GeForce RTX 4080 SUPER | 512x512 | 0.044 | 7.46735 | 0.044 | 7.46735 |
48+
| NVIDIA GeForce RTX 4080 SUPER | 256x256 | 0.01 | 3.4597 | 0.01 | 3.4597 |
49+
| NVIDIA GeForce RTX 4080 SUPER | 2048x2048 | 1.317 | 87.1615 | 1.291 | 23.447 |
50+
| NVIDIA GeForce RTX 4080 SUPER | 1024x1024 | 0.213 | 23.4215 | 0.214 | 23.4215 |
51+
| NVIDIA GeForce RTX 3090 | 512x512 | 0.058 | 5.65638 | 0.058 | 5.65638 |
52+
| NVIDIA GeForce RTX 3090 | 256x256 | 0.016 | 2.45081 | 0.016 | 2.45081 |
53+
| NVIDIA GeForce RTX 3090 | 2048x2048 | 1.755 | 77.8239 | 1.614 | 18.4193 |
54+
| NVIDIA GeForce RTX 3090 | 1024x1024 | 0.265 | 18.4023 | 0.265 | 18.4023 |
55+
| NVIDIA GeForce RTX 3080 | 512x512 | 0.064 | 13.6568 | 0.064 | 13.6568 |
56+
| NVIDIA GeForce RTX 3080 | 256x256 | 0.018 | 5.91728 | 0.018 | 5.91728 |
57+
| NVIDIA GeForce RTX 3080 | 2048x2048 | OOM | OOM | 1.866 | 44.4717 |
58+
| NVIDIA GeForce RTX 3080 | 1024x1024 | 0.302 | 44.4308 | 0.302 | 44.4308 |
59+
| NVIDIA GeForce RTX 3070 | 512x512 | 0.093 | 17.1465 | 0.093 | 17.1465 |
60+
| NVIDIA GeForce RTX 3070 | 256x256 | 0.025 | 7.42931 | 0.026 | 7.42931 |
61+
| NVIDIA GeForce RTX 3070 | 2048x2048 | OOM | OOM | 2.674 | 55.8355 |
62+
| NVIDIA GeForce RTX 3070 | 1024x1024 | 0.443 | 55.7841 | 0.443 | 55.7841 |
63+
64+
</details>
65+
66+
## Available VAEs
67+
68+
| | **Endpoint** | **Model** |
69+
|:-:|:-----------:|:--------:|
70+
| **Stable Diffusion v1** | [https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud](https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud) | [`stabilityai/sd-vae-ft-mse`](https://hf.co/stabilityai/sd-vae-ft-mse) |
71+
| **Stable Diffusion XL** | [https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud](https://xjqqhmyn62rog84g.us-east-1.aws.endpoints.huggingface.cloud) | [`madebyollin/sdxl-vae-fp16-fix`](https://hf.co/madebyollin/sdxl-vae-fp16-fix) |
72+
| **Flux** | [https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud](https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud) | [`black-forest-labs/FLUX.1-schnell`](https://hf.co/black-forest-labs/FLUX.1-schnell) |
73+
74+
75+
> [!TIP]
76+
> Model support can be requested [here](https://github.com/huggingface/diffusers/issues/new?template=remote-vae-pilot-feedback.yml).
77+
78+
79+
## Code
80+
81+
> [!TIP]
82+
> Install `diffusers` from `main` to run the code: `pip install git+https://github.com/huggingface/diffusers@main`
83+
84+
85+
A helper method simplifies interacting with Hybrid Inference.
86+
87+
```python
88+
from diffusers.utils.remote_utils import remote_encode
89+
```
90+
91+
### Basic example
92+
93+
Let's encode an image, then decode it to demonstrate.
94+
95+
<figure class="image flex flex-col items-center justify-center text-center m-0 w-full">
96+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"/>
97+
</figure>
98+
99+
<details><summary>Code</summary>
100+
101+
```python
102+
from diffusers.utils import load_image
103+
from diffusers.utils.remote_utils import remote_decode
104+
105+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg?download=true")
106+
107+
latent = remote_encode(
108+
endpoint="https://ptccx55jz97f9zgo.us-east-1.aws.endpoints.huggingface.cloud/",
109+
scaling_factor=0.3611,
110+
shift_factor=0.1159,
111+
)
112+
113+
decoded = remote_decode(
114+
endpoint="https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud/",
115+
tensor=latent,
116+
scaling_factor=0.3611,
117+
shift_factor=0.1159,
118+
)
119+
```
120+
121+
</details>
122+
123+
<figure class="image flex flex-col items-center justify-center text-center m-0 w-full">
124+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/decoded.png"/>
125+
</figure>
126+
127+
128+
### Generation
129+
130+
Now let's look at a generation example, we'll encode the image, generate then remotely decode too!
131+
132+
<details><summary>Code</summary>
133+
134+
```python
135+
import torch
136+
from diffusers import StableDiffusionImg2ImgPipeline
137+
from diffusers.utils import load_image
138+
from diffusers.utils.remote_utils import remote_decode, remote_encode
139+
140+
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
141+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
142+
torch_dtype=torch.float16,
143+
variant="fp16",
144+
vae=None,
145+
).to("cuda")
146+
147+
init_image = load_image(
148+
"https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
149+
)
150+
init_image = init_image.resize((768, 512))
151+
152+
init_latent = remote_encode(
153+
endpoint="https://qc6479g0aac6qwy9.us-east-1.aws.endpoints.huggingface.cloud/",
154+
image=init_image,
155+
scaling_factor=0.18215,
156+
)
157+
158+
prompt = "A fantasy landscape, trending on artstation"
159+
latent = pipe(
160+
prompt=prompt,
161+
image=init_latent,
162+
strength=0.75,
163+
output_type="latent",
164+
).images
165+
166+
image = remote_decode(
167+
endpoint="https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud/",
168+
tensor=latent,
169+
scaling_factor=0.18215,
170+
)
171+
image.save("fantasy_landscape.jpg")
172+
```
173+
174+
</details>
175+
176+
<figure class="image flex flex-col items-center justify-center text-center m-0 w-full">
177+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/remote_vae/fantasy_landscape.png"/>
178+
</figure>
179+
180+
## Integrations
181+
182+
* **[SD.Next](https://github.com/vladmandic/sdnext):** All-in-one UI with direct supports Hybrid Inference.
183+
* **[ComfyUI-HFRemoteVae](https://github.com/kijai/ComfyUI-HFRemoteVae):** ComfyUI node for Hybrid Inference.

docs/source/en/quantization/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,6 @@ Diffusers currently supports the following quantization methods.
3636
- [BitsandBytes](./bitsandbytes)
3737
- [TorchAO](./torchao)
3838
- [GGUF](./gguf)
39+
- [Quanto](./quanto.md)
3940

4041
[This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques.

0 commit comments

Comments
 (0)