Skip to content

Commit c86e356

Browse files
committed
Merge branch 'dev' into dataset-cache
2 parents 0253472 + 5a2afb3 commit c86e356

22 files changed

+530
-273
lines changed

README-ja.md

Lines changed: 7 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
SDXLがサポートされました。sdxlブランチはmainブランチにマージされました。リポジトリを更新したときにはUpgradeの手順を実行してください。また accelerate のバージョンが上がっていますので、accelerate config を再度実行してください。
2-
3-
SDXL学習については[こちら](./README.md#sdxl-training)をご覧ください(英語です)。
4-
51
## リポジトリについて
62
Stable Diffusionの学習、画像生成、その他のスクリプトを入れたリポジトリです。
73

@@ -21,6 +17,7 @@ GUIやPowerShellスクリプトなど、より使いやすくする機能が[bma
2117

2218
* [学習について、共通編](./docs/train_README-ja.md) : データ整備やオプションなど
2319
* [データセット設定](./docs/config_README-ja.md)
20+
* [SDXL学習](./docs/train_SDXL-en.md) (英語版)
2421
* [DreamBoothの学習について](./docs/train_db_README-ja.md)
2522
* [fine-tuningのガイド](./docs/fine_tune_README_ja.md):
2623
* [LoRAの学習について](./docs/train_network_README-ja.md)
@@ -44,9 +41,7 @@ PowerShellを使う場合、venvを使えるようにするためには以下の
4441

4542
## Windows環境でのインストール
4643

47-
スクリプトはPyTorch 2.0.1でテストしています。PyTorch 1.12.1でも動作すると思われます。
48-
49-
以下の例ではPyTorchは2.0.1/CUDA 11.8版をインストールします。CUDA 11.6版やPyTorch 1.12.1を使う場合は適宜書き換えください。
44+
スクリプトはPyTorch 2.1.2でテストしています。PyTorch 2.0.1、1.12.1でも動作すると思われます。
5045

5146
(なお、python -m venv~の行で「python」とだけ表示された場合、py -m venv~のようにpythonをpyに変更してください。)
5247

@@ -59,20 +54,20 @@ cd sd-scripts
5954
python -m venv venv
6055
.\venv\Scripts\activate
6156
62-
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
57+
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
6358
pip install --upgrade -r requirements.txt
64-
pip install xformers==0.0.20
59+
pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118
6560
6661
accelerate config
6762
```
6863

6964
コマンドプロンプトでも同一です。
7065

71-
(注:``python -m venv venv`` のほうが ``python -m venv --system-site-packages venv`` より安全そうなため書き換えました。globalなpythonにパッケージがインストールしてあると、後者だといろいろと問題が起きます。)
66+
注:`bitsandbytes==0.43.0``prodigyopt==1.0``lion-pytorch==0.0.6``requirements.txt` に含まれるようになりました。他のバージョンを使う場合は適宜インストールしてください。
7267

73-
accelerate configの質問には以下のように答えてください。(bf16で学習する場合、最後の質問にはbf16と答えてください。)
68+
この例では PyTorch および xfomers は2.1.2/CUDA 11.8版をインストールします。CUDA 12.1版やPyTorch 1.12.1を使う場合は適宜書き換えください。たとえば CUDA 12.1版の場合は `pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121` および `pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu121` としてください。
7469

75-
※0.15.0から日本語環境では選択のためにカーソルキーを押すと落ちます(……)。数字キーの0、1、2……で選択できますので、そちらを使ってください。
70+
accelerate configの質問には以下のように答えてください。(bf16で学習する場合、最後の質問にはbf16と答えてください。)
7671

7772
```txt
7873
- This machine
@@ -87,41 +82,6 @@ accelerate configの質問には以下のように答えてください。(bf1
8782
※場合によって ``ValueError: fp16 mixed precision requires a GPU`` というエラーが出ることがあるようです。この場合、6番目の質問(
8883
``What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``)に「0」と答えてください。(id `0`のGPUが使われます。)
8984

90-
### オプション:`bitsandbytes`(8bit optimizer)を使う
91-
92-
`bitsandbytes`はオプションになりました。Linuxでは通常通りpipでインストールできます(0.41.1または以降のバージョンを推奨)。
93-
94-
Windowsでは0.35.0または0.41.1を推奨します。
95-
96-
- `bitsandbytes` 0.35.0: 安定しているとみられるバージョンです。AdamW8bitは使用できますが、他のいくつかの8bit optimizer、学習時の`full_bf16`オプションは使用できません。
97-
- `bitsandbytes` 0.41.1: Lion8bit、PagedAdamW8bit、PagedLion8bitをサポートします。`full_bf16`が使用できます。
98-
99-
注:`bitsandbytes` 0.35.0から0.41.0までのバージョンには問題があるようです。 https://github.com/TimDettmers/bitsandbytes/issues/659
100-
101-
以下の手順に従い、`bitsandbytes`をインストールしてください。
102-
103-
### 0.35.0を使う場合
104-
105-
PowerShellの例です。コマンドプロンプトではcpの代わりにcopyを使ってください。
106-
107-
```powershell
108-
cd sd-scripts
109-
.\venv\Scripts\activate
110-
pip install bitsandbytes==0.35.0
111-
112-
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
113-
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
114-
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
115-
```
116-
117-
### 0.41.1を使う場合
118-
119-
jllllll氏の配布されている[こちら](https://github.com/jllllll/bitsandbytes-windows-webui) または他の場所から、Windows用のwhlファイルをインストールしてください。
120-
121-
```powershell
122-
python -m pip install bitsandbytes==0.41.1 --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
123-
```
124-
12585
## アップグレード
12686

12787
新しいリリースがあった場合、以下のコマンドで更新できます。
@@ -151,4 +111,3 @@ Conv2d 3x3への拡大は [cloneofsimo氏](https://github.com/cloneofsimo/lora)
151111

152112
[BLIP](https://github.com/salesforce/BLIP): BSD-3-Clause
153113

154-

README.md

Lines changed: 22 additions & 128 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
__SDXL is now supported. The sdxl branch has been merged into the main branch. If you update the repository, please follow the upgrade instructions. Also, the version of accelerate has been updated, so please run accelerate config again.__ The documentation for SDXL training is [here](./README.md#sdxl-training).
2-
31
This repository contains training, generation and utility scripts for Stable Diffusion.
42

53
[__Change History__](#change-history) is moved to the bottom of the page.
@@ -20,9 +18,9 @@ This repository contains the scripts for:
2018

2119
## About requirements.txt
2220

23-
These files do not contain requirements for PyTorch. Because the versions of them depend on your environment. Please install PyTorch at first (see installation guide below.)
21+
The file does not contain requirements for PyTorch. Because the version of PyTorch depends on the environment, it is not included in the file. Please install PyTorch first according to the environment. See installation instructions below.
2422

25-
The scripts are tested with Pytorch 2.0.1. 1.12.1 is not tested but should work.
23+
The scripts are tested with Pytorch 2.1.2. 2.0.1 and 1.12.1 is not tested but should work.
2624

2725
## Links to usage documentation
2826

@@ -32,12 +30,13 @@ Most of the documents are written in Japanese.
3230

3331
* [Training guide - common](./docs/train_README-ja.md) : data preparation, options etc...
3432
* [Chinese version](./docs/train_README-zh.md)
33+
* [SDXL training](./docs/train_SDXL-en.md) (English version)
3534
* [Dataset config](./docs/config_README-ja.md)
3635
* [English version](./docs/config_README-en.md)
3736
* [DreamBooth training guide](./docs/train_db_README-ja.md)
3837
* [Step by Step fine-tuning guide](./docs/fine_tune_README_ja.md):
39-
* [training LoRA](./docs/train_network_README-ja.md)
40-
* [training Textual Inversion](./docs/train_ti_README-ja.md)
38+
* [Training LoRA](./docs/train_network_README-ja.md)
39+
* [Training Textual Inversion](./docs/train_ti_README-ja.md)
4140
* [Image generation](./docs/gen_img_README-ja.md)
4241
* note.com [Model conversion](https://note.com/kohya_ss/n/n374f316fe4ad)
4342

@@ -65,14 +64,18 @@ cd sd-scripts
6564
python -m venv venv
6665
.\venv\Scripts\activate
6766
68-
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
67+
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
6968
pip install --upgrade -r requirements.txt
70-
pip install xformers==0.0.20
69+
pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118
7170
7271
accelerate config
7372
```
7473

75-
__Note:__ Now bitsandbytes is optional. Please install any version of bitsandbytes as needed. Installation instructions are in the following section.
74+
If `python -m venv` shows only `python`, change `python` to `py`.
75+
76+
__Note:__ Now `bitsandbytes==0.43.0`, `prodigyopt==1.0` and `lion-pytorch==0.0.6` are included in the requirements.txt. If you'd like to use the another version, please install it manually.
77+
78+
This installation is for CUDA 11.8. If you use a different version of CUDA, please install the appropriate version of PyTorch and xformers. For example, if you use CUDA 12, please install `pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu121` and `pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu121`.
7679

7780
<!--
7881
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
@@ -91,48 +94,13 @@ Answers to accelerate config:
9194
- fp16
9295
```
9396

94-
note: Some user reports ``ValueError: fp16 mixed precision requires a GPU`` is occurred in training. In this case, answer `0` for the 6th question:
97+
If you'd like to use bf16, please answer `bf16` to the last question.
98+
99+
Note: Some user reports ``ValueError: fp16 mixed precision requires a GPU`` is occurred in training. In this case, answer `0` for the 6th question:
95100
``What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``
96101

97102
(Single GPU with id `0` will be used.)
98103

99-
### Optional: Use `bitsandbytes` (8bit optimizer)
100-
101-
For 8bit optimizer, you need to install `bitsandbytes`. For Linux, please install `bitsandbytes` as usual (0.41.1 or later is recommended.)
102-
103-
For Windows, there are several versions of `bitsandbytes`:
104-
105-
- `bitsandbytes` 0.35.0: Stable version. AdamW8bit is available. `full_bf16` is not available.
106-
- `bitsandbytes` 0.41.1: Lion8bit, PagedAdamW8bit and PagedLion8bit are available. `full_bf16` is available.
107-
108-
Note: `bitsandbytes`above 0.35.0 till 0.41.0 seems to have an issue: https://github.com/TimDettmers/bitsandbytes/issues/659
109-
110-
Follow the instructions below to install `bitsandbytes` for Windows.
111-
112-
### bitsandbytes 0.35.0 for Windows
113-
114-
Open a regular Powershell terminal and type the following inside:
115-
116-
```powershell
117-
cd sd-scripts
118-
.\venv\Scripts\activate
119-
pip install bitsandbytes==0.35.0
120-
121-
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
122-
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
123-
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
124-
```
125-
126-
This will install `bitsandbytes` 0.35.0 and copy the necessary files to the `bitsandbytes` directory.
127-
128-
### bitsandbytes 0.41.1 for Windows
129-
130-
Install the Windows version whl file from [here](https://github.com/jllllll/bitsandbytes-windows-webui) or other sources, like:
131-
132-
```powershell
133-
python -m pip install bitsandbytes==0.41.1 --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
134-
```
135-
136104
## Upgrade
137105

138106
When a new release comes out you can upgrade your repo with the following command:
@@ -163,92 +131,16 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
163131
[BLIP](https://github.com/salesforce/BLIP): BSD-3-Clause
164132

165133

166-
## SDXL training
167-
168-
The documentation in this section will be moved to a separate document later.
169-
170-
### Training scripts for SDXL
171-
172-
- `sdxl_train.py` is a script for SDXL fine-tuning. The usage is almost the same as `fine_tune.py`, but it also supports DreamBooth dataset.
173-
- `--full_bf16` option is added. Thanks to KohakuBlueleaf!
174-
- This option enables the full bfloat16 training (includes gradients). This option is useful to reduce the GPU memory usage.
175-
- The full bfloat16 training might be unstable. Please use it at your own risk.
176-
- The different learning rates for each U-Net block are now supported in sdxl_train.py. Specify with `--block_lr` option. Specify 23 values separated by commas like `--block_lr 1e-3,1e-3 ... 1e-3`.
177-
- 23 values correspond to `0: time/label embed, 1-9: input blocks 0-8, 10-12: mid blocks 0-2, 13-21: output blocks 0-8, 22: out`.
178-
- `prepare_buckets_latents.py` now supports SDXL fine-tuning.
179-
180-
- `sdxl_train_network.py` is a script for LoRA training for SDXL. The usage is almost the same as `train_network.py`.
181-
182-
- Both scripts has following additional options:
183-
- `--cache_text_encoder_outputs` and `--cache_text_encoder_outputs_to_disk`: Cache the outputs of the text encoders. This option is useful to reduce the GPU memory usage. This option cannot be used with options for shuffling or dropping the captions.
184-
- `--no_half_vae`: Disable the half-precision (mixed-precision) VAE. VAE for SDXL seems to produce NaNs in some cases. This option is useful to avoid the NaNs.
185-
186-
- `--weighted_captions` option is not supported yet for both scripts.
187-
188-
- `sdxl_train_textual_inversion.py` is a script for Textual Inversion training for SDXL. The usage is almost the same as `train_textual_inversion.py`.
189-
- `--cache_text_encoder_outputs` is not supported.
190-
- There are two options for captions:
191-
1. Training with captions. All captions must include the token string. The token string is replaced with multiple tokens.
192-
2. Use `--use_object_template` or `--use_style_template` option. The captions are generated from the template. The existing captions are ignored.
193-
- See below for the format of the embeddings.
194-
195-
- `--min_timestep` and `--max_timestep` options are added to each training script. These options can be used to train U-Net with different timesteps. The default values are 0 and 1000.
196-
197-
### Utility scripts for SDXL
198-
199-
- `tools/cache_latents.py` is added. This script can be used to cache the latents to disk in advance.
200-
- The options are almost the same as `sdxl_train.py'. See the help message for the usage.
201-
- Please launch the script as follows:
202-
`accelerate launch --num_cpu_threads_per_process 1 tools/cache_latents.py ...`
203-
- This script should work with multi-GPU, but it is not tested in my environment.
204-
205-
- `tools/cache_text_encoder_outputs.py` is added. This script can be used to cache the text encoder outputs to disk in advance.
206-
- The options are almost the same as `cache_latents.py` and `sdxl_train.py`. See the help message for the usage.
207-
208-
- `sdxl_gen_img.py` is added. This script can be used to generate images with SDXL, including LoRA, Textual Inversion and ControlNet-LLLite. See the help message for the usage.
209-
210-
### Tips for SDXL training
211-
212-
- The default resolution of SDXL is 1024x1024.
213-
- The fine-tuning can be done with 24GB GPU memory with the batch size of 1. For 24GB GPU, the following options are recommended __for the fine-tuning with 24GB GPU memory__:
214-
- Train U-Net only.
215-
- Use gradient checkpointing.
216-
- Use `--cache_text_encoder_outputs` option and caching latents.
217-
- Use Adafactor optimizer. RMSprop 8bit or Adagrad 8bit may work. AdamW 8bit doesn't seem to work.
218-
- The LoRA training can be done with 8GB GPU memory (10GB recommended). For reducing the GPU memory usage, the following options are recommended:
219-
- Train U-Net only.
220-
- Use gradient checkpointing.
221-
- Use `--cache_text_encoder_outputs` option and caching latents.
222-
- Use one of 8bit optimizers or Adafactor optimizer.
223-
- Use lower dim (4 to 8 for 8GB GPU).
224-
- `--network_train_unet_only` option is highly recommended for SDXL LoRA. Because SDXL has two text encoders, the result of the training will be unexpected.
225-
- PyTorch 2 seems to use slightly less GPU memory than PyTorch 1.
226-
- `--bucket_reso_steps` can be set to 32 instead of the default value 64. Smaller values than 32 will not work for SDXL training.
227-
228-
Example of the optimizer settings for Adafactor with the fixed learning rate:
229-
```toml
230-
optimizer_type = "adafactor"
231-
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
232-
lr_scheduler = "constant_with_warmup"
233-
lr_warmup_steps = 100
234-
learning_rate = 4e-7 # SDXL original learning rate
235-
```
236-
237-
### Format of Textual Inversion embeddings for SDXL
238-
239-
```python
240-
from safetensors.torch import save_file
134+
## Change History
241135

242-
state_dict = {"clip_g": embs_for_text_encoder_1280, "clip_l": embs_for_text_encoder_768}
243-
save_file(state_dict, file)
244-
```
136+
### Masked loss
245137

246-
### ControlNet-LLLite
138+
`train_network.py`, `sdxl_train_network.py` and `sdxl_train.py` now support the masked loss. `--masked_loss` option is added.
247139

248-
ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [documentation](./docs/train_lllite_README.md) for details.
140+
NOTE: `train_network.py` and `sdxl_train.py` are not tested yet.
249141

142+
ControlNet dataset is used to specify the mask. The mask images should be the RGB images. The pixel value 255 in R channel is treated as the mask (the loss is calculated only for the pixels with the mask), and 0 is treated as the non-mask. See details for the dataset specification in the [LLLite documentation](./docs/train_lllite_README.md#preparing-the-dataset).
250143

251-
## Change History
252144

253145
### Working in progress
254146

@@ -362,6 +254,8 @@ We would like to express our deep gratitude to Mark Saint (cacoe) from leonardo.
362254
Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
363255
最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
364256

257+
## Additional Information
258+
365259
### Naming of LoRA
366260

367261
The LoRA supported by `train_network.py` has been named to avoid confusion. The documentation has been updated. The following are the names of LoRA types in this repository.

0 commit comments

Comments
 (0)