Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.vuepress/notes/zh/mm_guide.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ export const MMGuide: ThemeNote = defineNoteConfig({
items: [
'install_image_video_generation',
'image_generation',
'image_editing',
],
},
]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: image_editing
createTime: 2025/11/16 12:46:49
permalink: /en/mm_guide/bafw70jq/
---
370 changes: 370 additions & 0 deletions docs/zh/notes/mm_guide/image_video_generation/image_editing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,370 @@
---
title: AIGC 图像编辑
createTime: 2025/07/15 22:38:45
permalink: /zh/mm_guide/image_editing/
icon: basil:lightning-alt-outline
---

# 图片编辑数据合成流水线

## 1. 概述
**图片编辑数据合成流水线**的核心目标是根据已有的图片以及指令生成对应的图片编辑数据,为每一个样本生成包含输入图片、编辑指令、以及生成图片的高质量数据。目前提供本地模型编辑、在线编辑模型nano-banana编辑,以及多张图片作为输入的图片编辑任务:

### 支持应用场景

- **图片局部细节修改**
- 对输入图片进行替换、消除、添加等操作
- 特点:图片整体内容变化不大,主要对图中某一特征或者物体进行改变

- **subject-driven 图片生成**
- 根据输入图片的主体特征进行图片生成
- 特点:图片整体内容变化比较大,需要保证生成结果中的主体特征与输入特片主体特征保持一致

- **multi-subject-driven 图片生成**
- 根据多张输入图片的主体特征进行图片生成
- 特点:需要在保证生成图片中多个主体的特征与输入多张图片主体特征保持一致的同时,同时需要多个主体之间交互自然

### 处理流程
1. **提示词生成**
- 根据设定类别,利用LLM生成图片描述以及编辑指令
- 确保编辑指令的泛化性,以及图片描述的准确性
2. **图片条件生成**
- 利用text-to-image生成模型(例如FLUX, Qwen-Image),根据图片描述生成对应的图片
3. **编辑图片生成**
- 利用本地编辑模型,或者在线编辑模型,根据编辑指令以及输入图片,生成目标图片

## 2. 快速开始
### 第一步 安装Dataflow-MM环境
```bash
cd ./Dataflow-MM
conda create -n Dataflow-MM python=3.12
pip install -e .
```

### 第二步 编辑数据数据准备
我们使用`jsonl`文件来记录数据,下面是一个简单的输入数据样例:
```jsonl
{"conversations": [{"content": "Change the woman's clothes to a white dress.", "role": "user"}], "images": ["./dataflow/example/test_image_editing/images/image1.png"]}
{"conversations": [{"content": "Change the vase to red.", "role": "user"}], "images": ["./dataflow/example/test_image_editing/images/image2.png"]}
{"conversations": [{"content": "The woman is dancing with the prince in a sacred ballroom.", "role": "user"}], "images": ["./dataflow/example/test_image_editing/images/image3.png"]}
```

数据加载需要定义`FileStorage`:
```python
storage = FileStorage(
first_entry_file_name="<Your jsonl file path>",
cache_path="./cache_local/<Your task name>",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)
```

### 第三步 运行流水线
可以参考下述指令运行图片编辑流水线。
```bash
python /path/to/DataFlow-MM/test/test_image_editing.py --serving_type 'local'
```
```bash
python /path/to/DataFlow-MM/test/test_image_editing.py --api_key \<your_key\> --api_url \<your_url\>
```

## 3. 流水线逻辑
### 3.1 提示词设计
利用在线LLM模型(例如gpt-4o)初始化提示词生成器
```python
from dataflow.operators.image_generation import PromptedT2ITextGenerator

os.environ["DF_API_KEY"] = api_key
serving = APILLMServing_request(
api_url=api_url,
model_name="gpt-4o",
max_workers=5,
)

text_to_image_sample_generator = PromptedT2ITextGenerator(
llm_serving=serving,
)
```
运行提示词生成器
```python
text_to_image_sample_generator.run(
storage=storage.step(),
input_style_key = "input_style",
input_prompt_key = "input_text",
output_prompt_key = "instruction",
output_prompt_key_2 = "output_img_discript",
)
```

### 3.2 图片条件获取
我们使用文本到图片生成模型,本文到图片生成器初始化方式如下
```python
t2i_serving = LocalImageGenServing(
image_io=ImageIO(save_path=os.path.join(storage.cache_path, "condition_images")),
batch_size=8,
hf_model_name_or_path="black-forest-labs/FLUX.1-dev",
hf_cache_dir="./cache_local",
hf_local_dir="./ckpt/models/"
)
```
运行文本到图片生成器
```python
text_to_image_generator.run(
storage=storage.step(),
input_conversation_key="input_text",
output_image_key="input_image",
)
```

### 3.3 编辑结果获取
本地模型调用方式如下:
```python
from dataflow.serving.local_image_gen_serving import LocalImageGenServing

self.serving = LocalImageGenServing(
image_io=ImageIO(save_path=os.path.join(self.storage.cache_path, "images")),
hf_model_name_or_path="black-forest-labs/FLUX.1-Kontext-dev",
hf_cache_dir="./cache_local",
hf_local_dir="./ckpt/models/",
Image_gen_task="imageedit",
batch_size=4,
diffuser_model_name="FLUX-Kontext",
diffuser_num_inference_steps=28,
diffuser_guidance_scale=3.5,
)
```

目前我们接入了使用nano-banana对图片进行编辑,参考前文的图片编辑,只要修改对应的serving即可运行nano-banana进行测试。模型调用方式如下所示:
```python
import os
from dataflow.serving.api_vlm_serving_openai import APIVLMServing_openai

os.environ['DF_API_KEY'] = args.api_key

self.serving = APIVLMServing_openai(
api_url=api_url,
model_name="gemini-2.5-flash-image-preview", # try nano-banana
image_io=ImageIO(save_path=os.path.join(self.storage.cache_path, "images")),
send_request_stream=True,
)
```
需要注意的是我们所使用的api来自于[huiyun](http://123.129.219.111:3000)

生成脚本调整如下:
```python
from dataflow.operators.core_vision import PromptedImageEditGenerator

self.generator = PromptedImageEditGenerator(pipe=self.serving)

self.generator.run(
storage=self.storage.step(),
input_image_key="images",
input_conversation_key="conversations",
output_image_key="edited_images",
)
```


## 4. 输出数据
- **格式**:`jsonl`
- **字段说明**:
- `conversations`: 包含图片编辑指令
- `images`: 包含被编辑图片以及输入的图片条件
- `edited_images`: 包含生成的图片编辑结果
- **示例**:
```jsonl
{
"conversations": [{"content": "The woman is dancing with the prince in a sacred ballroom.", "role": "user"}],
"images": ["./dataflow/example/test_image_editing/images/image3.png"],
"edited_images": [""]
}
```

## 5. 运行方式
这里支持本地编辑模型以及在线编辑模型两种生成方式,同时支持多张图片作为输入的
- 本地编辑模型图片编辑流水线
```bash
python /path/to/DataFlow-MM/test/test_image_editing.py --serving_type 'local'
```
- 在线编辑模型图片编辑流水线
```bash
python /path/to/DataFlow-MM/test/test_image_editing.py --api_key <your_key> --api_url <your_url>
```
- 多张图片输入的图片编辑流水线
```bash
python /path/to/DataFlow-MM/test/test_multi_images_to_image_generation.py --api_key <your_key> --api_url <your_url>
```

## 6. 流水线示例
以下给出示例流水线,演示如何使用多个算子实现图片编辑数据生成。这些示例展示了如何从图片条件获取到编辑图片结果生成的全部过程。
- **本地模型图片编辑数据合成流水线**:
```python
from dataflow.utils.storage import FileStorage
from dataflow.io import ImageIO
from dataflow.operators.core_vision import PromptedImageEditGenerator
from dataflow.serving.local_image_gen_serving import LocalImageGenServing

class ImageGenerationPipeline():
def __init__(self, serving_type="local", api_key="", api_url="http://123.129.219.111:3000/v1/"):
self.storage = FileStorage(
first_entry_file_name="./dataflow/example/image_gen/image_edit/prompts.jsonl",
cache_path="./cache_local/multi2single_image_gen",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)

self.serving = LocalImageGenServing(
image_io=ImageIO(save_path=os.path.join(self.storage.cache_path, "target_images")),
hf_model_name_or_path="black-forest-labs/FLUX.1-Kontext-dev",
hf_cache_dir="./cache_local",
hf_local_dir="./ckpt/models/",
Image_gen_task="imageedit",
batch_size=4,
diffuser_model_name="FLUX-Kontext",
diffuser_num_inference_steps=28,
diffuser_guidance_scale=3.5,
)

self.text_to_image_generator = PromptedImageEditGenerator(
image_edit_serving=self.serving
)

def forward(self):
self.text_to_image_generator.run(
storage=self.storage.step(),
input_image_key="images",
input_conversation_key="conversations",
output_image_key="edited_images",
)
```

- **在线模型图片编辑数据合成流水线**:
```python
import os
from dataflow.utils.storage import FileStorage
from dataflow.io import ImageIO
from dataflow.operators.core_vision import PromptedImageEditGenerator
from dataflow.serving.api_vlm_serving_openai import APIVLMServing_openai

class ImageGenerationPipeline():
def __init__(self, serving_type="local", api_key="", api_url="http://123.129.219.111:3000/v1/"):
os.environ['DF_API_KEY'] = api_key
self.storage = FileStorage(
first_entry_file_name="./dataflow/example/image_gen/image_edit/prompts.jsonl",
cache_path="./cache_local/multi2single_image_gen",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)

self.serving = APIVLMServing_openai(
api_url=api_url,
model_name="gemini-2.5-flash-image-preview",
image_io=ImageIO(save_path=os.path.join(self.storage.cache_path, "target_images")),
# send_request_stream=True, # if use ip http://123.129.219.111:3000/ add this line
)

self.text_to_image_generator = PromptedImageEditGenerator(
image_edit_serving=self.serving,
save_interval=10
)

def forward(self):
self.text_to_image_generator.run(
storage=self.storage.step(),
input_image_key="images",
input_conversation_key="conversations",
output_image_key="edited_images",
)
```

- **多图输入图片编辑数据合成流水线**:
```python
import os
from dataflow.operators.image_generation import PromptedT2ITextGenerator
from dataflow.operators.core_vision import PromptedImageGenerator
from dataflow.operators.core_vision import PromptedImageEditGenerator
from dataflow.serving.api_llm_serving_request import APILLMServing_request
from dataflow.serving.api_vlm_serving_openai import APIVLMServing_openai
from dataflow.serving.local_image_gen_serving import LocalImageGenServing
from dataflow.prompts.image_gen_prompt_generator import MultiImagesToImagePromptGenerator
from dataflow.utils.storage import FileStorage
from dataflow.io import ImageIO


class MultiImages2ImagePipeline():
def __init__(
self,
serving_type="api",
api_key="",
api_url="https://api.openai.com/v1/",
api_vlm_url="https://api.openai.com/v1/",
ip_condition_num=1,
repeat_times=1
):
self.storage = FileStorage(
first_entry_file_name="./dataflow/example/image_gen/multi_image_input_gen/prompts.jsonl",
cache_path="./cache_local/multi_images_to_image_gen",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)

os.environ["DF_API_KEY"] = api_key
self.serving = APILLMServing_request(
api_url=api_url,
model_name="gpt-4o",
max_workers=5,
)

self.t2i_serving = LocalImageGenServing(
image_io=ImageIO(save_path=os.path.join(self.storage.cache_path, "condition_images")),
batch_size=8,
hf_model_name_or_path="/ytech_m2v5_hdd/CheckPoints/FLUX.1-dev",
hf_cache_dir="./cache_local",
hf_local_dir="./ckpt/models/"
)

self.vlm_serving = APIVLMServing_openai(
api_url=api_vlm_url,
model_name="gemini-2.5-flash-image-preview",
image_io=ImageIO(save_path=os.path.join(self.storage.cache_path, "target_images")),
# send_request_stream=True, # if use ip http://123.129.219.111:3000/ add this line
)

self.t2i_text_prompt_generator = MultiImagesToImagePromptGenerator()

self.text_to_image_sample_generator = PromptedT2ITextGenerator(
llm_serving=self.serving,
)

self.text_to_image_generator = PromptedImageGenerator(
t2i_serving=self.t2i_serving,
)

self.image_editing_generator = PromptedImageEditGenerator(
image_edit_serving=self.vlm_serving,
)

def forward(self):
self.text_to_image_sample_generator.run(
storage=self.storage.step(),
prompt_generator=self.t2i_text_prompt_generator,
input_style_key = "input_style",
input_prompt_key = "input_text",
output_prompt_key = "instruction",
output_prompt_key_2 = "output_img_discript",
)

self.text_to_image_generator.run(
storage=self.storage.step(),
input_conversation_key="input_text",
output_image_key="input_image",
)

self.image_editing_generator.run(
storage=self.storage.step(),
input_image_key="input_image",
input_conversation_key="output_img_discript",
output_image_key="output_image",
)
```
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ permalink: /zh/mm_guide/5ub4phag/
icon: basil:lightning-alt-outline
---

# Text-to-Image数据合成流水线

## 1. 概述
Text-to-Image数据合成流水线的核心目标是提供最基本的图片获取方式。


# 快速开始
为了让DataFlow可以支持图片生成功能,我们基于[diffuser](https://github.com/huggingface/diffusers)中最新的图片生成方法实现大规模的图像生成与编辑,与此同时,我们支持了nano banana(gemini-2.5-flash-image)api对图片进行编辑。

Expand Down
Loading