You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+52-28
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,17 @@ __Stable Diffusion web UI now seems to support LoRA trained by ``sd-scripts``.__
22
22
23
23
The feature of SDXL training is now available in sdxl branch as an experimental feature.
24
24
25
+
Aug 4, 2023: The feature will be merged into the main branch soon. Following are the changes from the previous version.
26
+
27
+
-`bitsandbytes` is now optional. Please install it if you want to use it. The insructions are in the later section.
28
+
-`albumentations` is not required anymore.
29
+
- An issue for pooled output for Textual Inversion training is fixed.
30
+
-`--v_pred_like_loss ratio` option is added. This option adds the loss like v-prediction loss in SDXL training. `0.1` means that the loss is added 10% of the v-prediction loss. The default value is None (disabled).
31
+
- In v-prediction, the loss is higher in the early timesteps (near the noise). This option can be used to increase the loss in the early timesteps.
32
+
- Arbitrary options can be used for Diffusers' schedulers. For example `--lr_scheduler_args "lr_end=1e-8"`.
33
+
-`sdxl_gen_imgs.py` supports batch size > 1.
34
+
- Fix ControlNet to work with attention couple and reginal LoRA in `gen_img_diffusers.py`.
35
+
25
36
Summary of the feature:
26
37
27
38
-`tools/cache_latents.py` is added. This script can be used to cache the latents to disk in advance.
@@ -65,12 +76,17 @@ Summary of the feature:
65
76
### Tips for SDXL training
66
77
67
78
- The default resolution of SDXL is 1024x1024.
68
-
- The fine-tuning can be done with 24GB GPU memory with the batch size of 1. For 24GB GPU, the following options are recommended:
79
+
- The fine-tuning can be done with 24GB GPU memory with the batch size of 1. For 24GB GPU, the following options are recommended__for the fine-tuning with 24GB GPU memory__:
69
80
- Train U-Net only.
70
81
- Use gradient checkpointing.
71
82
- Use `--cache_text_encoder_outputs` option and caching latents.
72
83
- Use Adafactor optimizer. RMSprop 8bit or Adagrad 8bit may work. AdamW 8bit doesn't seem to work.
73
-
- The LoRA training can be done with 12GB GPU memory.
84
+
- The LoRA training can be done with 8GB GPU memory (10GB recommended). For reducing the GPU memory usage, the following options are recommended:
85
+
- Train U-Net only.
86
+
- Use gradient checkpointing.
87
+
- Use `--cache_text_encoder_outputs` option and caching latents.
88
+
- Use one of 8bit optimizers or Adafactor optimizer.
89
+
- Use lower dim (-8 for 8GB GPU).
74
90
-`--network_train_unet_only` option is highly recommended for SDXL LoRA. Because SDXL has two text encoders, the result of the training will be unexpected.
75
91
- PyTorch 2 seems to use slightly less GPU memory than PyTorch 1.
76
92
-`--bucket_reso_steps` can be set to 32 instead of the default value 64. Smaller values than 32 will not work for SDXL training.
-[ ] Change `--output_config` option to continue the training.
101
-
-[ ] Extend `--full_bf16` for all the scripts.
102
-
-[x] Support Textual Inversion training.
103
-
104
112
## About requirements.txt
105
113
106
114
These files do not contain requirements for PyTorch. Because the versions of them depend on your environment. Please install PyTorch at first (see installation guide below.)
107
115
108
-
The scripts are tested with PyTorch 1.12.1 and 2.0.1, Diffusers 0.17.1.
116
+
The scripts are tested with PyTorch 1.12.1 and 2.0.1, Diffusers 0.18.2.
@@ -204,26 +211,43 @@ Answers to accelerate config should be the same as above.
204
211
Other versions of PyTorch and xformers seem to have problems with training.
205
212
If there is no other reason, please install the specified version.
206
213
207
-
### Optional: Use Lion8bit
214
+
### Optional: Use `bitsandbytes` (8bit optimizer)
215
+
216
+
For 8bit optimizer, you need to install `bitsandbytes`. For Linux, please install `bitsandbytes` as usual (0.41.1 or later is recommended.)
217
+
218
+
For Windows, there are several versions of `bitsandbytes`:
219
+
220
+
-`bitsandbytes` 0.35.0: Stable version. AdamW8bit is available. `full_bf16` is not available.
221
+
-`bitsandbytes` 0.39.1: Lion8bit, PagedAdamW8bit and PagedLion8bit are available. `full_bf16` is available.
222
+
223
+
Note: `bitsandbytes`above 0.35.0 till 0.41.0 seems to have an issue: https://github.com/TimDettmers/bitsandbytes/issues/659
208
224
209
-
For Lion8bit, you need to upgrade `bitsandbytes` to 0.38.0 or later. Uninstall `bitsandbytes`, and for Windows, install the Windows version whl file from [here](https://github.com/jllllll/bitsandbytes-windows-webui) or other sources, like:
225
+
Follow the instructions below to install `bitsandbytes` for Windows.
226
+
227
+
### bitsandbytes 0.35.0 for Windows
228
+
229
+
Open a regular Powershell terminal and type the following inside:
For upgrading, upgrade this repo with `pip install .`, and upgrade necessary packages manually.
241
+
This will install `bitsandbytes` 0.35.0 and copy the necessary files to the `bitsandbytes` directory.
216
242
217
-
### Optional: Use PagedAdamW8bit and PagedLion8bit
243
+
### bitsandbytes 0.39.1 for Windows
218
244
219
-
For PagedAdamW8bit and PagedLion8bit, you need to upgrade `bitsandbytes` to 0.39.0 or later. Uninstall `bitsandbytes`, and for Windows, install the Windows version whl file from [here](https://github.com/jllllll/bitsandbytes-windows-webui) or other sources, like:
245
+
Install the Windows version whl file from [here](https://github.com/jllllll/bitsandbytes-windows-webui) or other sources, like:
0 commit comments