feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

Dong1017 · 2025-09-17T06:53:22Z

What does this PR do?

Adds

1. QwenImage Pipelines and Required Modules

(Comparable with Diffusers Master)

a. Pipelines

mindone.diffusers.QwenImagePipeline
mindone.diffusers.QwenImageImg2ImgPipeline
mindone.diffusers.QwenImageInpaintPipeline
mindone.diffusers.QwenImageEditPipeline
mindone.diffusers.QwenImageEditInpaintPipeline

b. Modules

mindone.diffusers.models.AutoencoderQwenImage
mindone.diffusers.models.QwenImageTransformer2DModel
mindone.diffusers.loaders.QwenImageLoraLoaderMixin

2. add UTs of pipelines

All UTs were setup according to Diffusers Master, accessed in Sep 17, 2025.
- tests/diffusers_tests/pipelines/qwenimage/test_qwenimage.py
- tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_img2img.py
- tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_inpaint.py
- tests/diffusers_tests/pipelines/qwenimage/test_qwenimage_edit.py
Using MindSpore 2.7.0 can pass both fp32 and bf16 UTs.
Using MindSpore 2.6.0 can pass bf16 UTs, while fp32 will happen to TypeError.

Fix

TypeError raised due to mindone/diffusers/loaders/lora_conversion_utils.py.

converted_state_dict[diffusers_down_key] = down_weight * scale_down -> converted_state_dict[diffusers_down_key] = Parameter(down_weight * scale_down)
converted_state_dict[diffusers_up_key] = up_weight * scale_up -> converted_state_dict[diffusers_up_key] = Parameter(up_weight * scale_up)

Usage

QwenImagePipeline

import mindspore as ms 
from mindone.diffusers import QwenImagePipeline 

pipe = QwenImagePipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=ms.bfloat16) 
prompt = "A cat holding a sign that says hello world" 
# Depending on the variant being used, the pipeline call will slightly vary. 
# Refer to the pipeline documentation for more details. 
image = pipe(prompt, num_inference_steps=50)[0][0] 
image.save("qwenimage.png")

QwenImageImg2ImgPipeline

import mindspore as ms 
from mindone.diffusers import QwenImageImg2ImgPipeline
from mindone.diffusers.utils import load_image

pipe = QwenImageImg2ImgPipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=mindspore.bfloat16)
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney"
images = pipe(prompt=prompt, negative_prompt=" ", image=init_image, strength=0.95)[0][0]
images.save("qwenimage_img2img.png")

QwenImageInpaintPipeline

import mindspore as ms 
from mindone.diffusers import QwenImageInpaintPipeline 
from mindone.diffusers.utils import load_image 

pipe = QwenImageInpaintPipeline.from_pretrained("Qwen/Qwen-Image", mindspore_dtype=ms.bfloat16) 
prompt = "Face of a yellow cat, high resolution, sitting on a park bench" 
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" 
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" 
source = load_image(img_url) 
mask = load_image(mask_url) 
image = pipe(prompt=prompt, negative_prompt=" ", image=source, mask_image=mask, strength=0.85)[0][0] 
image.save("qwenimage_inpainting.png")

QwenImageEditPipeline

import mindspore as ms 
from PIL import Image 
from mindone.diffusers import QwenImageEditPipeline 
from mindone.diffusers.utils import load_image 

pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", mindspore_dtype=ms.bfloat16) 
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/yarn-art-pikachu.png").convert("RGB") 
prompt = ("Make Pikachu hold a sign that says 'Qwen Edit is awesome', yarn art style, detailed, vibrant colors") 
# Depending on the variant being used, the pipeline call will slightly vary. 
# Refer to the pipeline documentation for more details. 
image = pipe(image, prompt, num_inference_steps=50)[0][0] 
image.save("qwenimage_edit.png")

QwenImageEditInpaintPipeline

import mindspore as ms 
from PIL import Image
from mindone.diffusers import QwenImageEditInpaintPipeline
from mindone.diffusers.utils import load_image

pipe = QwenImageEditInpaintPipeline.from_pretrained("Qwen/Qwen-Image-Edit", mindspore_dtype=mindspore.bfloat16)
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
source = load_image(img_url)
mask = load_image(mask_url)
image = pipe(prompt=prompt, negative_prompt=" ", image=source, mask_image=mask, strength=1.0, num_inference_steps=50)[0][0]
image.save("qwenimage_inpainting.png")

LoRA Infer

from mindone.diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler
import mindspore
import math

scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),  # We use shift=3 in distillation
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),  # We use shift=3 in distillation
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,  # set shift_terminal to None
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image", scheduler=scheduler, mindspore_dtype=mindspore.bfloat16
)
pipe.load_lora_weights(
    "Qwen/lightx2v/Qwen-Image-Lightning", 
    weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors", 
    adapter_name="qwenimage-lora"
)
pipe.fuse_lora()
pipe.unload_lora_weights()

prompt = "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K, cinematic composition."
negative_prompt = " "
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=None,
)[0][0]
image.save("lora_pic/qwen_fewsteps_lora.png")

Performance

Experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.0

Pipeline	Weight Loading Time	Mode	Speed (s/it)
QwenImagePipeline	13m7s	Pynative / jit	9.93 / 8.63
QwenImageImg2ImgPipeline	12m53s	Pynative / jit	9.56 / 11.47
QwenImageInpaintPipeline	13m10s	Pynative / jit	4.80 / 4.35
QwenImageEditPipeline	14m22s	Pynative / jit	13.25 / 13.93
QwenImageEditInpaintPipeline	13m40s	Pynative / jit	13.98 / 13.77
QwenImagePipeline + LoRA	13m27s	Pynative / jit	2.96 / 4.47

Limitation

QwenImageEditPipeline and QwenImageEditInpaintPipeline will load modules from Qwen-Image-Edit. The use of these two pipes requires manually changing image_processor_type from Qwen2VLImageProcessorFast to Qwen2VLImageProcessor in Qwen-Image-Edit/processor/preprocessor_config.json

Notes

require transformers==4.52.1
The produced pictures are nearly identical to those by Torch, when setting consistent random seed and hidden states from the text encoder.
TODO: jit mode; LORA test; UTs of modules

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

SamitHuang · 2025-09-21T11:57:31Z

Can add an inference example and lora fine-tune example in examples folder, which helps introduce QwenImage

… SamitHuang

Dong1017 · 2025-09-26T09:03:38Z

How to fix the requirement of transformers==4.52.1?

The main reason for using transformers==4.52.1 rather than transformers==4.50.0 is to avoid AttributeError and keep consistent with the requirements from Qwen-Image.
Using transformers==4.50.0 will raise the following AttributeError:

../../transformers/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:1517: in __init__
    super().__init__(config)
../../transformers/src/transformers/modeling_utils.py:1898: in __init__
    self.generation_config = GenerationConfig.from_model_config(config) if self.can_generate() else None
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        decoder_config = model_config.get_text_config(decoder=True)
        if decoder_config is not model_config:
            default_generation_config = GenerationConfig()
>           decoder_config_dict = decoder_config.to_dict()
                                  ^^^^^^^^^^^^^^^^^^^^^^
E           AttributeError: 'dict' object has no attribute 'to_dict'

../../transformers/src/transformers/generation/configuration_utils.py:1287: AttributeError

Upgrading transformers from 4.50.0 to 4.52.1 or highr version will solve this error.

…qwenimage

Dong1017 added 30 commits August 15, 2025 14:14

2025/08/15

d3dec44

2025/8/15 17:18 revised

6006960

2025/8/18 10:22 revised

15bc8ae

2025/8/18 17:00 revised

103db50

2025/8/18 19:08 revised

77779b5

2025/8/18 19:13 revised

0cab22b

2025/8/19 9:02 revised

2b7b4c9

2025/8/19 9:04 revised

d7eaa37

2025/8/19 9:12 revised

dddd8f2

2025/8/19 10:27 revised

3117bdc

2025/8/20 9:22 revised

e19c2e3

2025/8/20 9:247 revised

0fb127a

2025/8/20 9:48 revised

5d317bc

2025/8/20 9:52 revised

e8043d8

2025/8/20 10:15 revised

b78ef0a

2025/8/20 10:50 revised

656acce

2025/8/20 11:11 revised

9a33d83

2025/8/20 11:27 revised

c2f972c

2025/8/20 11:47 revised

9e2cccf

2025/8/20 14:25 revised

9b5be21

2025/8/20 14:26 revised

1906919

2025/8/21 15:20 revised

c3055ba

2025/8/21 15:24 revised

e025800

2025/8/21 17:08 revised

436ebf3

2025/8/21 17:57 revised

dafec1a

2025/8/21 19:13 revised

e573be1

2025/8/22 11:32 revised

d549ab2

2025/8/22 17:40 revised

09ac0bd

2025/8/25 10:40 revised

fb5877b

2025/8/26 10:30 revised

358b20b

SamitHuang reviewed Sep 21, 2025

View reviewed changes

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py Outdated Show resolved Hide resolved

mindone/diffusers/pipelines/qwenimage/pipeline_qwenimage.py Show resolved Hide resolved

SamitHuang mentioned this pull request Sep 22, 2025

Update readme #1298

Open

6 tasks

Dong1017 added 2 commits September 26, 2025 16:29

fix a bug of qwen2_5_vl, some revisions suggested from Cui-yshoho and…

c70d315

… SamitHuang

Resolved the conflict regarding qwen2_5_vl masked_scatter-bf16-bug

735e6af

Dong1017 and others added 9 commits September 28, 2025 17:20

Add UTs of transformer, supplement MDs, delete unused code comments

237183e

update md to notice the use of transformers==4.52.1

a498371

fix ci problem

e73acef

Merge branch 'master' into qwenimage

07b10a7

fix ci problem

dbb8ac2

Merge branch 'qwenimage' of https://github.com/Dong1017/mindone into …

d88f7e4

…qwenimage

fix ci problem

427961a

fix ci problem

54af7f1

Merge branch 'master' into qwenimage

4eb6673

vigo999 assigned Dong1017 Sep 29, 2025

vigo999 added the new model add new model to mindone label Sep 29, 2025

vigo999 added this to mindone Sep 29, 2025

vigo999 moved this to In Progress in mindone Sep 29, 2025

Dong1017 and others added 8 commits September 30, 2025 08:56

CHECK: pre-commit run --all-files

0524331

fix ci problem - strange format?

3102629

Trigger CI

faca606

fix ci problem - modeling_reformer

c2508b9

Merge branch 'master' into qwenimage

ad179cd

Merge branch 'master' into qwenimage

3dd544d

fix: lora infer - lora_conversion_utils.py

2ccd046

revise format of some strings

dd55624

vigo999 approved these changes Oct 18, 2025

View reviewed changes

vigo999 merged commit 6ec66e8 into mindspore-lab:master Oct 18, 2025
3 checks passed

github-project-automation bot moved this from In Progress to Done in mindone Oct 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

Uh oh!

Dong1017 commented Sep 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

SamitHuang commented Sep 21, 2025

Uh oh!

Dong1017 commented Sep 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master #1288

Uh oh!

Conversation

Dong1017 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Adds

1. QwenImage Pipelines and Required Modules

a. Pipelines

b. Modules

2. add UTs of pipelines

Fix

Usage

Performance

Limitation

Notes

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

SamitHuang commented Sep 21, 2025

Uh oh!

Dong1017 commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dong1017 commented Sep 17, 2025 •

edited

Loading

Dong1017 commented Sep 26, 2025 •

edited

Loading