Skip to content

Commit 1044348

Browse files
authored
Merge pull request #197 from huggingface/main
Merge changes
2 parents 0ed3049 + 1b202c5 commit 1044348

File tree

107 files changed

+2040
-708
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

107 files changed

+2040
-708
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,8 @@ jobs:
359359
test_location: "bnb"
360360
- backend: "gguf"
361361
test_location: "gguf"
362+
- backend: "torchao"
363+
test_location: "torchao"
362364
runs-on:
363365
group: aws-g6e-xlarge-plus
364366
container:

.github/workflows/pypi_publish.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
- name: Test installing diffusers and importing
6969
run: |
7070
pip install diffusers && pip uninstall diffusers -y
71-
pip install -i https://testpypi.python.org/pypi diffusers
71+
pip install -i https://test.pypi.org/simple/ diffusers
7272
python -c "from diffusers import __version__; print(__version__)"
7373
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
7474
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"

docs/source/en/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@
429429
- local: api/pipelines/ledits_pp
430430
title: LEDITS++
431431
- local: api/pipelines/ltx_video
432-
title: LTX
432+
title: LTXVideo
433433
- local: api/pipelines/lumina
434434
title: Lumina-T2X
435435
- local: api/pipelines/marigold

docs/source/en/api/models/autoencoder_kl_hunyuan_video.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLHunyuanVideo
2020

21-
vae = AutoencoderKLHunyuanVideo.from_pretrained("tencent/HunyuanVideo", torch_dtype=torch.float16)
21+
vae = AutoencoderKLHunyuanVideo.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="vae", torch_dtype=torch.float16)
2222
```
2323

2424
## AutoencoderKLHunyuanVideo

docs/source/en/api/models/autoencoderkl_ltx_video.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLLTXVideo
2020

21-
vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
21+
vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")
2222
```
2323

2424
## AutoencoderKLLTXVideo

docs/source/en/api/models/hunyuan_video_transformer_3d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import HunyuanVideoTransformer3DModel
2020

21-
transformer = HunyuanVideoTransformer3DModel.from_pretrained("tencent/HunyuanVideo", torch_dtype=torch.bfloat16)
21+
transformer = HunyuanVideoTransformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16)
2222
```
2323

2424
## HunyuanVideoTransformer3DModel

docs/source/en/api/models/ltx_video_transformer3d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import LTXVideoTransformer3DModel
2020

21-
transformer = LTXVideoTransformer3DModel.from_pretrained("TODO/TODO", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
21+
transformer = LTXVideoTransformer3DModel.from_pretrained("Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
2222
```
2323

2424
## LTXVideoTransformer3DModel

docs/source/en/api/models/sana_transformer2d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The model can be loaded with the following code snippet.
2222
```python
2323
from diffusers import SanaTransformer2DModel
2424

25-
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers", subfolder="transformer", torch_dtype=torch.float16)
25+
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
2626
```
2727

2828
## SanaTransformer2DModel

docs/source/en/api/pipelines/hunyuan_video.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Recommendations for inference:
2929
- Transformer should be in `torch.bfloat16`.
3030
- VAE should be in `torch.float16`.
3131
- `num_frames` should be of the form `4 * k + 1`, for example `49` or `129`.
32-
- For smaller resolution images, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution images, try higher values (between `7.0` and `12.0`). The default value is `7.0` for HunyuanVideo.
32+
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution images, try higher values (between `7.0` and `12.0`). The default value is `7.0` for HunyuanVideo.
3333
- For more information about supported resolutions and other details, please refer to the original repository [here](https://github.com/Tencent/HunyuanVideo/).
3434

3535
## HunyuanVideoPipeline

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License. -->
1414

15-
# LTX
15+
# LTX Video
1616

1717
[LTX Video](https://huggingface.co/Lightricks/LTX-Video) is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image + text-to-video usecases.
1818

@@ -22,14 +22,24 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
2222

2323
</Tip>
2424

25+
Available models:
26+
27+
| Model name | Recommended dtype |
28+
|:-------------:|:-----------------:|
29+
| [`LTX Video 0.9.0`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors) | `torch.bfloat16` |
30+
| [`LTX Video 0.9.1`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) | `torch.bfloat16` |
31+
32+
Note: The recommended dtype is for the transformer component. The VAE and text encoders can be either `torch.float32`, `torch.bfloat16` or `torch.float16` but the recommended dtype is `torch.bfloat16` as used in the original repository.
33+
2534
## Loading Single Files
2635

27-
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`].
36+
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`]. We recommend using `from_single_file` for the Lightricks series of models, as they plan to release multiple models in the future in the single file format.
2837

2938
```python
3039
import torch
3140
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel
3241

42+
# `single_file_url` could also be https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors
3343
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.safetensors"
3444
transformer = LTXVideoTransformer3DModel.from_single_file(
3545
single_file_url, torch_dtype=torch.bfloat16
@@ -99,6 +109,34 @@ export_to_video(video, "output_gguf_ltx.mp4", fps=24)
99109

100110
Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn more about our GGUF support.
101111

112+
<!-- TODO(aryan): Update this when official weights are supported -->
113+
114+
Loading and running inference with [LTX Video 0.9.1](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) weights.
115+
116+
```python
117+
import torch
118+
from diffusers import LTXPipeline
119+
from diffusers.utils import export_to_video
120+
121+
pipe = LTXPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
122+
pipe.to("cuda")
123+
124+
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
125+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
126+
127+
video = pipe(
128+
prompt=prompt,
129+
negative_prompt=negative_prompt,
130+
width=768,
131+
height=512,
132+
num_frames=161,
133+
decode_timestep=0.03,
134+
decode_noise_scale=0.025,
135+
num_inference_steps=50,
136+
).frames[0]
137+
export_to_video(video, "output.mp4", fps=24)
138+
```
139+
102140
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
103141

104142
## LTXPipeline

0 commit comments

Comments
 (0)