You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
__SDXL is now supported. The sdxl branch has been merged into the main branch. If you update the repository, please follow the upgrade instructions. Also, the version of accelerate has been updated, so please run accelerate config again.__ The documentation for SDXL training is [here](./README.md#sdxl-training).
2
-
3
1
This repository contains training, generation and utility scripts for Stable Diffusion.
4
2
5
3
[__Change History__](#change-history) is moved to the bottom of the page.
@@ -20,9 +18,9 @@ This repository contains the scripts for:
20
18
21
19
## About requirements.txt
22
20
23
-
These files do not contain requirements for PyTorch. Because the versions of them depend on your environment. Please install PyTorch at first (see installation guide below.)
21
+
The file does not contain requirements for PyTorch. Because the version of PyTorch depends on the environment, it is not included in the file. Please install PyTorch first according to the environment. See installation instructions below.
24
22
25
-
The scripts are tested with Pytorch 2.0.1. 1.12.1 is not tested but should work.
23
+
The scripts are tested with Pytorch 2.1.2. 2.0.1 and 1.12.1 is not tested but should work.
26
24
27
25
## Links to usage documentation
28
26
@@ -32,12 +30,13 @@ Most of the documents are written in Japanese.
32
30
33
31
*[Training guide - common](./docs/train_README-ja.md) : data preparation, options etc...
__Note:__ Now bitsandbytes is optional. Please install any version of bitsandbytes as needed. Installation instructions are in the following section.
74
+
If `python -m venv` shows only `python`, change `python` to `py`.
75
+
76
+
__Note:__ Now `bitsandbytes==0.43.0`, `prodigyopt==1.0` and `lion-pytorch==0.0.6` are included in the requirements.txt. If you'd like to use the another version, please install it manually.
77
+
78
+
This installation is for CUDA 11.8. If you use a different version of CUDA, please install the appropriate version of PyTorch and xformers. For example, if you use CUDA 12, please install `pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121` and `pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu121`.
The documentation in this section will be moved to a separate document later.
169
-
170
-
### Training scripts for SDXL
171
-
172
-
-`sdxl_train.py` is a script for SDXL fine-tuning. The usage is almost the same as `fine_tune.py`, but it also supports DreamBooth dataset.
173
-
-`--full_bf16` option is added. Thanks to KohakuBlueleaf!
174
-
- This option enables the full bfloat16 training (includes gradients). This option is useful to reduce the GPU memory usage.
175
-
- The full bfloat16 training might be unstable. Please use it at your own risk.
176
-
- The different learning rates for each U-Net block are now supported in sdxl_train.py. Specify with `--block_lr` option. Specify 23 values separated by commas like `--block_lr 1e-3,1e-3 ... 1e-3`.
-`prepare_buckets_latents.py` now supports SDXL fine-tuning.
179
-
180
-
-`sdxl_train_network.py` is a script for LoRA training for SDXL. The usage is almost the same as `train_network.py`.
181
-
182
-
- Both scripts has following additional options:
183
-
-`--cache_text_encoder_outputs` and `--cache_text_encoder_outputs_to_disk`: Cache the outputs of the text encoders. This option is useful to reduce the GPU memory usage. This option cannot be used with options for shuffling or dropping the captions.
184
-
-`--no_half_vae`: Disable the half-precision (mixed-precision) VAE. VAE for SDXL seems to produce NaNs in some cases. This option is useful to avoid the NaNs.
185
-
186
-
-`--weighted_captions` option is not supported yet for both scripts.
187
-
188
-
-`sdxl_train_textual_inversion.py` is a script for Textual Inversion training for SDXL. The usage is almost the same as `train_textual_inversion.py`.
189
-
-`--cache_text_encoder_outputs` is not supported.
190
-
- There are two options for captions:
191
-
1. Training with captions. All captions must include the token string. The token string is replaced with multiple tokens.
192
-
2. Use `--use_object_template` or `--use_style_template` option. The captions are generated from the template. The existing captions are ignored.
193
-
- See below for the format of the embeddings.
194
-
195
-
-`--min_timestep` and `--max_timestep` options are added to each training script. These options can be used to train U-Net with different timesteps. The default values are 0 and 1000.
196
-
197
-
### Utility scripts for SDXL
198
-
199
-
-`tools/cache_latents.py` is added. This script can be used to cache the latents to disk in advance.
200
-
- The options are almost the same as `sdxl_train.py'. See the help message for the usage.
- This script should work with multi-GPU, but it is not tested in my environment.
204
-
205
-
-`tools/cache_text_encoder_outputs.py` is added. This script can be used to cache the text encoder outputs to disk in advance.
206
-
- The options are almost the same as `cache_latents.py` and `sdxl_train.py`. See the help message for the usage.
207
-
208
-
-`sdxl_gen_img.py` is added. This script can be used to generate images with SDXL, including LoRA, Textual Inversion and ControlNet-LLLite. See the help message for the usage.
209
-
210
-
### Tips for SDXL training
211
-
212
-
- The default resolution of SDXL is 1024x1024.
213
-
- The fine-tuning can be done with 24GB GPU memory with the batch size of 1. For 24GB GPU, the following options are recommended __for the fine-tuning with 24GB GPU memory__:
214
-
- Train U-Net only.
215
-
- Use gradient checkpointing.
216
-
- Use `--cache_text_encoder_outputs` option and caching latents.
217
-
- Use Adafactor optimizer. RMSprop 8bit or Adagrad 8bit may work. AdamW 8bit doesn't seem to work.
218
-
- The LoRA training can be done with 8GB GPU memory (10GB recommended). For reducing the GPU memory usage, the following options are recommended:
219
-
- Train U-Net only.
220
-
- Use gradient checkpointing.
221
-
- Use `--cache_text_encoder_outputs` option and caching latents.
222
-
- Use one of 8bit optimizers or Adafactor optimizer.
223
-
- Use lower dim (4 to 8 for 8GB GPU).
224
-
-`--network_train_unet_only` option is highly recommended for SDXL LoRA. Because SDXL has two text encoders, the result of the training will be unexpected.
225
-
- PyTorch 2 seems to use slightly less GPU memory than PyTorch 1.
226
-
-`--bucket_reso_steps` can be set to 32 instead of the default value 64. Smaller values than 32 will not work for SDXL training.
227
-
228
-
Example of the optimizer settings for Adafactor with the fixed learning rate:
`train_network.py`, `sdxl_train_network.py` and `sdxl_train.py` now support the masked loss. `--masked_loss` option is added.
247
139
248
-
ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [documentation](./docs/train_lllite_README.md) for details.
140
+
NOTE: `train_network.py` and `sdxl_train.py` are not tested yet.
249
141
142
+
ControlNet dataset is used to specify the mask. The mask images should be the RGB images. The pixel value 255 in R channel is treated as the mask (the loss is calculated only for the pixels with the mask), and 0 is treated as the non-mask. See details for the dataset specification in the [LLLite documentation](./docs/train_lllite_README.md#preparing-the-dataset).
250
143
251
-
## Change History
252
144
253
145
### Working in progress
254
146
@@ -362,6 +254,8 @@ We would like to express our deep gratitude to Mark Saint (cacoe) from leonardo.
362
254
Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
The LoRA supported by `train_network.py` has been named to avoid confusion. The documentation has been updated. The following are the names of LoRA types in this repository.
0 commit comments