Skip to content

Commit 0d9a3a4

Browse files
committed
merge dev, replace resume and lora+ with upstream (untested)
Squashed commit of the following: commit 56bb81c Author: Kohya S <[email protected]> Date: Wed Jun 12 21:39:35 2024 +0900 add grad_hook after restore state closes kohya-ss#1344 commit 22413a5 Merge: 3259928 18d7597 Author: Kohya S <[email protected]> Date: Tue Jun 11 19:52:03 2024 +0900 Merge pull request kohya-ss#1359 from kohya-ss/train_resume_step Train resume step commit 18d7597 Author: Kohya S <[email protected]> Date: Tue Jun 11 19:51:30 2024 +0900 update README commit 4a44188 Merge: 4dbcef4 3259928 Author: Kohya S <[email protected]> Date: Tue Jun 11 19:27:37 2024 +0900 Merge branch 'dev' into train_resume_step commit 3259928 Merge: 1a104dc 5bfe5e4 Author: Kohya S <[email protected]> Date: Sun Jun 9 19:26:42 2024 +0900 Merge branch 'dev' of https://github.com/kohya-ss/sd-scripts into dev commit 1a104dc Author: Kohya S <[email protected]> Date: Sun Jun 9 19:26:36 2024 +0900 make forward/backward pathes same ref kohya-ss#1363 commit 58fb648 Author: Kohya S <[email protected]> Date: Sun Jun 9 19:26:09 2024 +0900 set static graph flag when DDP ref kohya-ss#1363 commit 5bfe5e4 Merge: e5bab69 4ecbac1 Author: Kohya S <[email protected]> Date: Thu Jun 6 21:23:24 2024 +0900 Merge pull request kohya-ss#1361 from shirayu/update/github_actions/crate-ci/typos-1.21.0 Bump crate-ci/typos from 1.19.0 to 1.21.0, fix typos, and updated _typos.toml (Close kohya-ss#1307) commit 4ecbac1 Author: Yuta Hayashibe <[email protected]> Date: Wed Jun 5 16:31:44 2024 +0900 Bump crate-ci/typos from 1.19.0 to 1.21.0, fix typos, and updated _typos.toml (Close kohya-ss#1307) commit 4dbcef4 Author: Kohya S <[email protected]> Date: Tue Jun 4 21:26:55 2024 +0900 update for corner cases commit 321e24d Merge: e5bab69 3eb27ce Author: Kohya S <[email protected]> Date: Tue Jun 4 19:30:11 2024 +0900 Merge pull request kohya-ss#1353 from KohakuBlueleaf/train_resume_step Resume correct step for "resume from state" feature. commit e5bab69 Author: Kohya S <[email protected]> Date: Sun Jun 2 21:11:40 2024 +0900 fix alpha mask without disk cache closes kohya-ss#1351, ref kohya-ss#1339 commit 3eb27ce Author: Kohaku-Blueleaf <[email protected]> Date: Fri May 31 12:24:15 2024 +0800 Skip the final 1 step commit b2363f1 Author: Kohaku-Blueleaf <[email protected]> Date: Fri May 31 12:20:20 2024 +0800 Final implementation commit 0d96e10 Merge: ffce3b5 fc85496 Author: Kohya S <[email protected]> Date: Mon May 27 21:41:16 2024 +0900 Merge pull request kohya-ss#1339 from kohya-ss/alpha-masked-loss Alpha masked loss commit fc85496 Author: Kohya S <[email protected]> Date: Mon May 27 21:25:06 2024 +0900 update docs for masked loss commit 2870be9 Merge: 71ad3c0 ffce3b5 Author: Kohya S <[email protected]> Date: Mon May 27 21:08:43 2024 +0900 Merge branch 'dev' into alpha-masked-loss commit 71ad3c0 Author: Kohya S <[email protected]> Date: Mon May 27 21:07:57 2024 +0900 Update masked_loss_README-ja.md add sample images commit ffce3b5 Merge: fb12b6d d50c1b3 Author: Kohya S <[email protected]> Date: Mon May 27 21:00:46 2024 +0900 Merge pull request kohya-ss#1349 from rockerBOO/patch-4 Update issue link commit a4c3155 Author: Kohya S <[email protected]> Date: Mon May 27 20:59:40 2024 +0900 add doc for mask loss commit 58cadf4 Merge: e8cfd4b fb12b6d Author: Kohya S <[email protected]> Date: Mon May 27 20:02:32 2024 +0900 Merge branch 'dev' into alpha-masked-loss commit d50c1b3 Author: Dave Lage <[email protected]> Date: Mon May 27 01:11:01 2024 -0400 Update issue link commit e8cfd4b Author: Kohya S <[email protected]> Date: Sun May 26 22:01:37 2024 +0900 fix to work cond mask and alpha mask commit fb12b6d Merge: febc5c5 00513b9 Author: Kohya S <[email protected]> Date: Sun May 26 19:45:03 2024 +0900 Merge pull request kohya-ss#1347 from rockerBOO/lora-plus-log-info Add LoRA+ LR Ratio info message to logger commit 00513b9 Author: rockerBOO <[email protected]> Date: Thu May 23 22:27:12 2024 -0400 Add LoRA+ LR Ratio info message to logger commit da6fea3 Author: Kohya S <[email protected]> Date: Sun May 19 21:26:18 2024 +0900 simplify and update alpha mask to work with various cases commit f2dd43e Author: Kohya S <[email protected]> Date: Sun May 19 19:23:59 2024 +0900 revert kwargs to explicit declaration commit db67529 Author: u-haru <[email protected]> Date: Sun May 19 19:07:25 2024 +0900 画像のアルファチャンネルをlossのマスクとして使用するオプションを追加 (kohya-ss#1223) * Add alpha_mask parameter and apply masked loss * Fix type hint in trim_and_resize_if_required function * Refactor code to use keyword arguments in train_util.py * Fix alpha mask flipping logic * Fix alpha mask initialization * Fix alpha_mask transformation * Cache alpha_mask * Update alpha_masks to be on CPU * Set flipped_alpha_masks to Null if option disabled * Check if alpha_mask is None * Set alpha_mask to None if option disabled * Add description of alpha_mask option to docs commit febc5c5 Author: Kohya S <[email protected]> Date: Sun May 19 19:03:43 2024 +0900 update README commit 4c79812 Author: Kohya S <[email protected]> Date: Sun May 19 19:00:32 2024 +0900 update README commit 38e4c60 Merge: e4d9e3c fc37437 Author: Kohya S <[email protected]> Date: Sun May 19 18:55:50 2024 +0900 Merge pull request kohya-ss#1277 from Cauldrath/negative_learning Allow negative learning rate commit e4d9e3c Author: Kohya S <[email protected]> Date: Sun May 19 17:46:07 2024 +0900 remove dependency for omegaconf #ref 1284 commit de0e0b9 Merge: c68baae 5cb145d Author: Kohya S <[email protected]> Date: Sun May 19 17:39:15 2024 +0900 Merge pull request kohya-ss#1284 from sdbds/fix_traincontrolnet Fix train controlnet commit c68baae Author: Kohya S <[email protected]> Date: Sun May 19 17:21:04 2024 +0900 add `--log_config` option to enable/disable output training config commit 47187f7 Merge: e3ddd1f b886d0a Author: Kohya S <[email protected]> Date: Sun May 19 16:31:33 2024 +0900 Merge pull request kohya-ss#1285 from ccharest93/main Hyperparameter tracking commit e3ddd1f Author: Kohya S <[email protected]> Date: Sun May 19 16:26:10 2024 +0900 update README and format code commit 0640f01 Merge: 2f19175 793aeb9 Author: Kohya S <[email protected]> Date: Sun May 19 16:23:01 2024 +0900 Merge pull request kohya-ss#1322 from aria1th/patch-1 Accelerate: fix get_trainable_params in controlnet-llite training commit 2f19175 Author: Kohya S <[email protected]> Date: Sun May 19 15:38:37 2024 +0900 update README commit 146edce Author: Kohya S <[email protected]> Date: Sat May 18 11:05:04 2024 +0900 support Diffusers' based SDXL LoRA key for inference commit 153764a Author: Kohya S <[email protected]> Date: Wed May 15 20:21:49 2024 +0900 add prompt option '--f' for filename commit 589c2aa Author: Kohya S <[email protected]> Date: Mon May 13 21:20:37 2024 +0900 update README commit 16677da Author: Kohya S <[email protected]> Date: Sun May 12 22:15:07 2024 +0900 fix create_network_from_weights doesn't work commit a384bf2 Merge: 1c296f7 8db0cad Author: Kohya S <[email protected]> Date: Sun May 12 21:36:56 2024 +0900 Merge pull request kohya-ss#1313 from rockerBOO/patch-3 Add caption_separator to output for subset commit 1c296f7 Merge: e96a521 dbb7bb2 Author: Kohya S <[email protected]> Date: Sun May 12 21:33:12 2024 +0900 Merge pull request kohya-ss#1312 from rockerBOO/patch-2 Fix caption_separator missing in subset schema commit e96a521 Merge: 39b82f2 fdbb03c Author: Kohya S <[email protected]> Date: Sun May 12 21:14:50 2024 +0900 Merge pull request kohya-ss#1291 from frodo821/patch-1 removed unnecessary `torch` import on line 115 commit 39b82f2 Author: Kohya S <[email protected]> Date: Sun May 12 20:58:45 2024 +0900 update readme commit 3701507 Author: Kohya S <[email protected]> Date: Sun May 12 20:56:56 2024 +0900 raise original error if error is occured in checking latents commit 7802093 Merge: 9ddb4d7 040e26f Author: Kohya S <[email protected]> Date: Sun May 12 20:46:25 2024 +0900 Merge pull request kohya-ss#1278 from Cauldrath/catch_latent_error_file Display name of error latent file commit 9ddb4d7 Author: Kohya S <[email protected]> Date: Sun May 12 17:55:08 2024 +0900 update readme and help message etc. commit 8d1b1ac Merge: 02298e3 64916a3 Author: Kohya S <[email protected]> Date: Sun May 12 17:43:44 2024 +0900 Merge pull request kohya-ss#1266 from Zovjsra/feature/disable-mmap Add "--disable_mmap_load_safetensors" parameter commit 02298e3 Merge: 1ffc0b3 4419041 Author: Kohya S <[email protected]> Date: Sun May 12 17:04:58 2024 +0900 Merge pull request kohya-ss#1331 from kohya-ss/lora-plus Lora plus commit 4419041 Author: Kohya S <[email protected]> Date: Sun May 12 17:01:20 2024 +0900 update docs etc. commit 3c8193f Author: Kohya S <[email protected]> Date: Sun May 12 17:00:51 2024 +0900 revert lora+ for lora_fa commit c6a4370 Merge: e01e148 1ffc0b3 Author: Kohya S <[email protected]> Date: Sun May 12 16:18:57 2024 +0900 Merge branch 'dev' into lora-plus commit 1ffc0b3 Author: Kohya S <[email protected]> Date: Sun May 12 16:18:43 2024 +0900 fix typo commit e01e148 Merge: e9f3a62 7983d3d Author: Kohya S <[email protected]> Date: Sun May 12 16:17:52 2024 +0900 Merge branch 'dev' into lora-plus commit e9f3a62 Merge: 3fd8cdc c1ba0b4 Author: Kohya S <[email protected]> Date: Sun May 12 16:17:27 2024 +0900 Merge branch 'dev' into lora-plus commit 7983d3d Merge: c1ba0b4 bee8cee Author: Kohya S <[email protected]> Date: Sun May 12 15:09:39 2024 +0900 Merge pull request kohya-ss#1319 from kohya-ss/fused-backward-pass Fused backward pass commit bee8cee Author: Kohya S <[email protected]> Date: Sun May 12 15:08:52 2024 +0900 update README for fused optimizer commit f3d2cf2 Author: Kohya S <[email protected]> Date: Sun May 12 15:03:02 2024 +0900 update README for fused optimizer commit 6dbc23c Merge: 607e041 c1ba0b4 Author: Kohya S <[email protected]> Date: Sun May 12 14:21:56 2024 +0900 Merge branch 'dev' into fused-backward-pass commit c1ba0b4 Author: Kohya S <[email protected]> Date: Sun May 12 14:21:10 2024 +0900 update readme commit 607e041 Author: Kohya S <[email protected]> Date: Sun May 12 14:16:41 2024 +0900 chore: Refactor optimizer group commit 793aeb9 Author: AngelBottomless <[email protected]> Date: Tue May 7 18:21:31 2024 +0900 fix get_trainable_params in controlnet-llite training commit b56d5f7 Author: Kohya S <[email protected]> Date: Mon May 6 21:35:39 2024 +0900 add experimental option to fuse params to optimizer groups commit 017b82e Author: Kohya S <[email protected]> Date: Mon May 6 15:05:42 2024 +0900 update help message for fused_backward_pass commit 2a359e0 Merge: 0540c33 4f203ce Author: Kohya S <[email protected]> Date: Mon May 6 15:01:56 2024 +0900 Merge pull request kohya-ss#1259 from 2kpr/fused_backward_pass Adafactor fused backward pass and optimizer step, lowers SDXL (@ 1024 resolution) VRAM usage to BF16(10GB)/FP32(16.4GB) commit 3fd8cdc Author: Kohya S <[email protected]> Date: Mon May 6 14:03:19 2024 +0900 fix dylora loraplus commit 7fe8150 Author: Kohya S <[email protected]> Date: Mon May 6 11:09:32 2024 +0900 update loraplus on dylora/lofa_fa commit 52e64c6 Author: Kohya S <[email protected]> Date: Sat May 4 18:43:52 2024 +0900 add debug log commit 58c2d85 Author: Kohya S <[email protected]> Date: Fri May 3 22:18:20 2024 +0900 support block dim/lr for sdxl commit 8db0cad Author: Dave Lage <[email protected]> Date: Thu May 2 18:08:28 2024 -0400 Add caption_separator to output for subset commit dbb7bb2 Author: Dave Lage <[email protected]> Date: Thu May 2 17:39:35 2024 -0400 Fix caption_separator missing in subset schema commit 969f82a Author: Kohya S <[email protected]> Date: Mon Apr 29 20:04:25 2024 +0900 move loraplus args from args to network_args, simplify log lr desc commit 834445a Merge: 0540c33 68467bd Author: Kohya S <[email protected]> Date: Mon Apr 29 18:05:12 2024 +0900 Merge pull request kohya-ss#1233 from rockerBOO/lora-plus Add LoRA+ support commit fdbb03c Author: frodo821 <[email protected]> Date: Tue Apr 23 14:29:05 2024 +0900 removed unnecessary `torch` import on line 115 as per kohya-ss#1290 commit 040e26f Author: Cauldrath <[email protected]> Date: Sun Apr 21 13:46:31 2024 -0400 Regenerate failed file If a latent file fails to load, print out the path and the error, then return false to regenerate it commit 5cb145d Author: 青龍聖者@bdsqlsz <[email protected]> Date: Sat Apr 20 21:56:24 2024 +0800 Update train_util.py commit b886d0a Author: Maatra <[email protected]> Date: Sat Apr 20 14:36:47 2024 +0100 Cleaned typing to be in line with accelerate hyperparameters type resctrictions commit 4477116 Author: 青龍聖者@bdsqlsz <[email protected]> Date: Sat Apr 20 21:26:09 2024 +0800 fix train controlnet commit 2c9db5d Author: Maatra <[email protected]> Date: Sat Apr 20 14:11:43 2024 +0100 passing filtered hyperparameters to accelerate commit fc37437 Author: Cauldrath <[email protected]> Date: Thu Apr 18 23:29:01 2024 -0400 Allow negative learning rate This can be used to train away from a group of images you don't want As this moves the model away from a point instead of towards it, the change in the model is unbounded So, don't set it too low. -4e-7 seemed to work well. commit feefcf2 Author: Cauldrath <[email protected]> Date: Thu Apr 18 23:15:36 2024 -0400 Display name of error latent file When trying to load stored latents, if an error occurs, this change will tell you what file failed to load Currently it will just tell you that something failed without telling you which file commit 64916a3 Author: Zovjsra <[email protected]> Date: Tue Apr 16 16:40:08 2024 +0800 add disable_mmap to args commit 4f203ce Author: 2kpr <[email protected]> Date: Sun Apr 14 09:56:58 2024 -0500 Fused backward pass commit 68467bd Author: rockerBOO <[email protected]> Date: Thu Apr 11 17:33:19 2024 -0400 Fix unset or invalid LR from making a param_group commit 75833e8 Author: rockerBOO <[email protected]> Date: Mon Apr 8 19:23:02 2024 -0400 Fix default LR, Add overall LoRA+ ratio, Add log `--loraplus_ratio` added for both TE and UNet Add log for lora+ commit 1933ab4 Author: rockerBOO <[email protected]> Date: Wed Apr 3 12:46:34 2024 -0400 Fix default_lr being applied commit c769160 Author: rockerBOO <[email protected]> Date: Mon Apr 1 15:43:04 2024 -0400 Add LoRA-FA for LoRA+ commit f99fe28 Author: rockerBOO <[email protected]> Date: Mon Apr 1 15:38:26 2024 -0400 Add LoRA+ support
1 parent d53b530 commit 0d9a3a4

30 files changed

+1412
-341
lines changed

.github/workflows/typos.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@ jobs:
1818
- uses: actions/checkout@v4
1919

2020
- name: typos-action
21-
uses: crate-ci/typos@v1.19.0
21+
uses: crate-ci/typos@v1.21.0

README.md

Lines changed: 128 additions & 0 deletions
Large diffs are not rendered by default.

_typos.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
# Instruction: https://github.com/marketplace/actions/typos-action#getting-started
33

44
[default.extend-identifiers]
5+
ddPn08="ddPn08"
56

67
[default.extend-words]
78
NIN="NIN"
@@ -27,6 +28,7 @@ rik="rik"
2728
koo="koo"
2829
yos="yos"
2930
wn="wn"
31+
hime="hime"
3032

3133

3234
[files]

docs/masked_loss_README-ja.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
## マスクロスについて
2+
3+
マスクロスは、入力画像のマスクで指定された部分だけ損失計算することで、画像の一部分だけを学習することができる機能です。
4+
たとえばキャラクタを学習したい場合、キャラクタ部分だけをマスクして学習することで、背景を無視して学習することができます。
5+
6+
マスクロスのマスクには、二種類の指定方法があります。
7+
8+
- マスク画像を用いる方法
9+
- 透明度(アルファチャネル)を使用する方法
10+
11+
なお、サンプルは [ずんずんPJイラスト/3Dデータ](https://zunko.jp/con_illust.html) の「AI画像モデル用学習データ」を使用しています。
12+
13+
### マスク画像を用いる方法
14+
15+
学習画像それぞれに対応するマスク画像を用意する方法です。学習画像と同じファイル名のマスク画像を用意し、それを学習画像と別のディレクトリに保存します。
16+
17+
- 学習画像
18+
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/607c5116-5f62-47de-8b66-9c4a597f0441)
19+
- マスク画像
20+
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/53e9b0f8-a4bf-49ed-882d-4026f84e8450)
21+
22+
```.toml
23+
[[datasets.subsets]]
24+
image_dir = "/path/to/a_zundamon"
25+
caption_extension = ".txt"
26+
conditioning_data_dir = "/path/to/a_zundamon_mask"
27+
num_repeats = 8
28+
```
29+
30+
マスク画像は、学習画像と同じサイズで、学習する部分を白、無視する部分を黒で描画します。グレースケールにも対応しています(127 ならロス重みが 0.5 になります)。なお、正確にはマスク画像の R チャネルが用いられます。
31+
32+
DreamBooth 方式の dataset で、`conditioning_data_dir` で指定したディレクトリにマスク画像を保存してください。ControlNet のデータセットと同じですので、詳細は [ControlNet-LLLite](train_lllite_README-ja.md#データセットの準備) を参照してください。
33+
34+
### 透明度(アルファチャネル)を使用する方法
35+
36+
学習画像の透明度(アルファチャネル)がマスクとして使用されます。透明度が 0 の部分は無視され、255 の部分は学習されます。半透明の場合は、その透明度に応じてロス重みが変化します(127 ならおおむね 0.5)。
37+
38+
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/0baa129b-446a-4aac-b98c-7208efb0e75e)
39+
40+
※それぞれの画像は透過PNG
41+
42+
学習時のスクリプトのオプションに `--alpha_mask` を指定するか、dataset の設定ファイルの subset で、`alpha_mask` を指定してください。たとえば、以下のようになります。
43+
44+
```toml
45+
[[datasets.subsets]]
46+
image_dir = "/path/to/image/dir"
47+
caption_extension = ".txt"
48+
num_repeats = 8
49+
alpha_mask = true
50+
```
51+
52+
## 学習時の注意事項
53+
54+
- 現時点では DreamBooth 方式の dataset のみ対応しています。
55+
- マスクは latents のサイズ、つまり 1/8 に縮小されてから適用されます。そのため、細かい部分(たとえばアホ毛やイヤリングなど)はうまく学習できない可能性があります。マスクをわずかに拡張するなどの工夫が必要かもしれません。
56+
- マスクロスを用いる場合、学習対象外の部分をキャプションに含める必要はないかもしれません。(要検証)
57+
- `alpha_mask` の場合、マスクの有無を切り替えると latents キャッシュが自動的に再生成されます。

docs/masked_loss_README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
## Masked Loss
2+
3+
Masked loss is a feature that allows you to train only part of an image by calculating the loss only for the part specified by the mask of the input image. For example, if you want to train a character, you can train only the character part by masking it, ignoring the background.
4+
5+
There are two ways to specify the mask for masked loss.
6+
7+
- Using a mask image
8+
- Using transparency (alpha channel) of the image
9+
10+
The sample uses the "AI image model training data" from [ZunZunPJ Illustration/3D Data](https://zunko.jp/con_illust.html).
11+
12+
### Using a mask image
13+
14+
This is a method of preparing a mask image corresponding to each training image. Prepare a mask image with the same file name as the training image and save it in a different directory from the training image.
15+
16+
- Training image
17+
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/607c5116-5f62-47de-8b66-9c4a597f0441)
18+
- Mask image
19+
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/53e9b0f8-a4bf-49ed-882d-4026f84e8450)
20+
21+
```.toml
22+
[[datasets.subsets]]
23+
image_dir = "/path/to/a_zundamon"
24+
caption_extension = ".txt"
25+
conditioning_data_dir = "/path/to/a_zundamon_mask"
26+
num_repeats = 8
27+
```
28+
29+
The mask image is the same size as the training image, with the part to be trained drawn in white and the part to be ignored in black. It also supports grayscale (127 gives a loss weight of 0.5). The R channel of the mask image is used currently.
30+
31+
Use the dataset in the DreamBooth method, and save the mask image in the directory specified by `conditioning_data_dir`. It is the same as the ControlNet dataset, so please refer to [ControlNet-LLLite](train_lllite_README.md#Preparing-the-dataset) for details.
32+
33+
### Using transparency (alpha channel) of the image
34+
35+
The transparency (alpha channel) of the training image is used as a mask. The part with transparency 0 is ignored, the part with transparency 255 is trained. For semi-transparent parts, the loss weight changes according to the transparency (127 gives a weight of about 0.5).
36+
37+
![image](https://github.com/kohya-ss/sd-scripts/assets/52813779/0baa129b-446a-4aac-b98c-7208efb0e75e)
38+
39+
※Each image is a transparent PNG
40+
41+
Specify `--alpha_mask` in the training script options or specify `alpha_mask` in the subset of the dataset configuration file. For example, it will look like this.
42+
43+
```toml
44+
[[datasets.subsets]]
45+
image_dir = "/path/to/image/dir"
46+
caption_extension = ".txt"
47+
num_repeats = 8
48+
alpha_mask = true
49+
```
50+
51+
## Notes on training
52+
53+
- At the moment, only the dataset in the DreamBooth method is supported.
54+
- The mask is applied after the size is reduced to 1/8, which is the size of the latents. Therefore, fine details (such as ahoge or earrings) may not be learned well. Some dilations of the mask may be necessary.
55+
- If using masked loss, it may not be necessary to include parts that are not to be trained in the caption. (To be verified)
56+
- In the case of `alpha_mask`, the latents cache is automatically regenerated when the enable/disable state of the mask is switched.

docs/train_network_README-ja.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ accelerate launch --num_cpu_threads_per_process 1 train_network.py
102102
* Text Encoderに関連するLoRAモジュールに、通常の学習率(--learning_rateオプションで指定)とは異なる学習率を使う時に指定します。Text Encoderのほうを若干低めの学習率(5e-5など)にしたほうが良い、という話もあるようです。
103103
* `--network_args`
104104
* 複数の引数を指定できます。後述します。
105+
* `--alpha_mask`
106+
* 画像のアルファ値をマスクとして使用します。透過画像を学習する際に使用します。[PR #1223](https://github.com/kohya-ss/sd-scripts/pull/1223)
105107

106108
`--network_train_unet_only``--network_train_text_encoder_only` の両方とも未指定時(デフォルト)はText EncoderとU-Netの両方のLoRAモジュールを有効にします。
107109

@@ -181,16 +183,16 @@ python networks\extract_lora_from_dylora.py --model "foldername/dylora-model.saf
181183

182184
詳細は[PR #355](https://github.com/kohya-ss/sd-scripts/pull/355) をご覧ください。
183185

184-
SDXLは現在サポートしていません。
185-
186186
フルモデルの25個のブロックの重みを指定できます。最初のブロックに該当するLoRAは存在しませんが、階層別LoRA適用等との互換性のために25個としています。またconv2d3x3に拡張しない場合も一部のブロックにはLoRAが存在しませんが、記述を統一するため常に25個の値を指定してください。
187187

188+
SDXL では down/up 9 個、middle 3 個の値を指定してください。
189+
188190
`--network_args` で以下の引数を指定してください。
189191

190192
- `down_lr_weight` : U-Netのdown blocksの学習率の重みを指定します。以下が指定可能です。
191-
- ブロックごとの重み : `"down_lr_weight=0,0,0,0,0,0,1,1,1,1,1,1"` のように12個の数値を指定します
193+
- ブロックごとの重み : `"down_lr_weight=0,0,0,0,0,0,1,1,1,1,1,1"` のように12個(SDXL では 9 個)の数値を指定します
192194
- プリセットからの指定 : `"down_lr_weight=sine"` のように指定します(サインカーブで重みを指定します)。sine, cosine, linear, reverse_linear, zeros が指定可能です。また `"down_lr_weight=cosine+.25"` のように `+数値` を追加すると、指定した数値を加算します(0.25~1.25になります)。
193-
- `mid_lr_weight` : U-Netのmid blockの学習率の重みを指定します。`"down_lr_weight=0.5"` のように数値を一つだけ指定します。
195+
- `mid_lr_weight` : U-Netのmid blockの学習率の重みを指定します。`"down_lr_weight=0.5"` のように数値を一つだけ指定します(SDXL の場合は 3 個)
194196
- `up_lr_weight` : U-Netのup blocksの学習率の重みを指定します。down_lr_weightと同様です。
195197
- 指定を省略した部分は1.0として扱われます。また重みを0にするとそのブロックのLoRAモジュールは作成されません。
196198
- `block_lr_zero_threshold` : 重みがこの値以下の場合、LoRAモジュールを作成しません。デフォルトは0です。
@@ -215,6 +217,9 @@ network_args = [ "block_lr_zero_threshold=0.1", "down_lr_weight=sine+.5", "mid_l
215217

216218
フルモデルの25個のブロックのdim (rank)を指定できます。階層別学習率と同様に一部のブロックにはLoRAが存在しない場合がありますが、常に25個の値を指定してください。
217219

220+
SDXL では 23 個の値を指定してください。一部のブロックにはLoRA が存在しませんが、`sdxl_train.py`[階層別学習率](./train_SDXL-en.md) との互換性のためです。
221+
対応は、`0: time/label embed, 1-9: input blocks 0-8, 10-12: mid blocks 0-2, 13-21: output blocks 0-8, 22: out` です。
222+
218223
`--network_args` で以下の引数を指定してください。
219224

220225
- `block_dims` : 各ブロックのdim (rank)を指定します。`"block_dims=2,2,2,2,4,4,4,4,6,6,6,6,8,6,6,6,6,4,4,4,4,2,2,2,2"` のように25個の数値を指定します。

docs/train_network_README-zh.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,8 @@ LoRA的模型将会被保存在通过`--output_dir`选项指定的文件夹中
101101
* 当在Text Encoder相关的LoRA模块中使用与常规学习率(由`--learning_rate`选项指定)不同的学习率时,应指定此选项。可能最好将Text Encoder的学习率稍微降低(例如5e-5)。
102102
* `--network_args`
103103
* 可以指定多个参数。将在下面详细说明。
104+
* `--alpha_mask`
105+
* 使用图像的 Alpha 值作为遮罩。这在学习透明图像时使用。[PR #1223](https://github.com/kohya-ss/sd-scripts/pull/1223)
104106

105107
当未指定`--network_train_unet_only``--network_train_text_encoder_only`时(默认情况),将启用Text Encoder和U-Net的两个LoRA模块。
106108

fine_tune.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,11 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
310310
init_kwargs["wandb"] = {"name": args.wandb_run_name}
311311
if args.log_tracker_config is not None:
312312
init_kwargs = toml.load(args.log_tracker_config)
313-
accelerator.init_trackers("finetuning" if args.log_tracker_name is None else args.log_tracker_name, init_kwargs=init_kwargs)
313+
accelerator.init_trackers(
314+
"finetuning" if args.log_tracker_name is None else args.log_tracker_name,
315+
config=train_util.get_sanitized_config_or_none(args),
316+
init_kwargs=init_kwargs,
317+
)
314318

315319
# For --sample_at_first
316320
train_util.sample_images(accelerator, args, 0, global_step, accelerator.device, vae, tokenizer, text_encoder, unet)
@@ -354,7 +358,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
354358

355359
# Sample noise, sample a random timestep for each image, and add noise to the latents,
356360
# with noise offset and/or multires noise if specified
357-
noise, noisy_latents, timesteps, huber_c = train_util.get_noise_noisy_latents_and_timesteps(args, noise_scheduler, latents)
361+
noise, noisy_latents, timesteps, huber_c = train_util.get_noise_noisy_latents_and_timesteps(
362+
args, noise_scheduler, latents
363+
)
358364

359365
# Predict the noise residual
360366
with accelerator.autocast():
@@ -368,7 +374,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
368374

369375
if args.min_snr_gamma or args.scale_v_pred_loss_like_noise_pred or args.debiased_estimation_loss:
370376
# do not mean over batch dimension for snr weight or scale v-pred loss
371-
loss = train_util.conditional_loss(noise_pred.float(), target.float(), reduction="none", loss_type=args.loss_type, huber_c=huber_c)
377+
loss = train_util.conditional_loss(
378+
noise_pred.float(), target.float(), reduction="none", loss_type=args.loss_type, huber_c=huber_c
379+
)
372380
loss = loss.mean([1, 2, 3])
373381

374382
if args.min_snr_gamma:
@@ -380,7 +388,9 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
380388

381389
loss = loss.mean() # mean over batch dimension
382390
else:
383-
loss = train_util.conditional_loss(noise_pred.float(), target.float(), reduction="mean", loss_type=args.loss_type, huber_c=huber_c)
391+
loss = train_util.conditional_loss(
392+
noise_pred.float(), target.float(), reduction="mean", loss_type=args.loss_type, huber_c=huber_c
393+
)
384394

385395
accelerator.backward(loss)
386396
if accelerator.sync_gradients and args.max_grad_norm != 0.0:
@@ -471,7 +481,7 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module):
471481

472482
accelerator.end_training()
473483

474-
if is_main_process and (args.save_state or args.save_state_on_train_end):
484+
if is_main_process and (args.save_state or args.save_state_on_train_end):
475485
train_util.save_state_on_train_end(args, accelerator)
476486

477487
del accelerator # この後メモリを使うのでこれは消す

finetune/prepare_buckets_latents.py

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,18 @@
1111

1212
import torch
1313
from library.device_utils import init_ipex, get_preferred_device
14+
1415
init_ipex()
1516

1617
from torchvision import transforms
1718

1819
import library.model_util as model_util
1920
import library.train_util as train_util
2021
from library.utils import setup_logging
22+
2123
setup_logging()
2224
import logging
25+
2326
logger = logging.getLogger(__name__)
2427

2528
DEVICE = get_preferred_device()
@@ -89,7 +92,9 @@ def main(args):
8992

9093
# bucketのサイズを計算する
9194
max_reso = tuple([int(t) for t in args.max_resolution.split(",")])
92-
assert len(max_reso) == 2, f"illegal resolution (not 'width,height') / 画像サイズに誤りがあります。'幅,高さ'で指定してください: {args.max_resolution}"
95+
assert (
96+
len(max_reso) == 2
97+
), f"illegal resolution (not 'width,height') / 画像サイズに誤りがあります。'幅,高さ'で指定してください: {args.max_resolution}"
9398

9499
bucket_manager = train_util.BucketManager(
95100
args.bucket_no_upscale, max_reso, args.min_bucket_reso, args.max_bucket_reso, args.bucket_reso_steps
@@ -107,7 +112,7 @@ def main(args):
107112
def process_batch(is_last):
108113
for bucket in bucket_manager.buckets:
109114
if (is_last and len(bucket) > 0) or len(bucket) >= args.batch_size:
110-
train_util.cache_batch_latents(vae, True, bucket, args.flip_aug, False)
115+
train_util.cache_batch_latents(vae, True, bucket, args.flip_aug, args.alpha_mask, False)
111116
bucket.clear()
112117

113118
# 読み込みの高速化のためにDataLoaderを使うオプション
@@ -208,7 +213,9 @@ def setup_parser() -> argparse.ArgumentParser:
208213
parser.add_argument("in_json", type=str, help="metadata file to input / 読み込むメタデータファイル")
209214
parser.add_argument("out_json", type=str, help="metadata file to output / メタデータファイル書き出し先")
210215
parser.add_argument("model_name_or_path", type=str, help="model name or path to encode latents / latentを取得するためのモデル")
211-
parser.add_argument("--v2", action="store_true", help="not used (for backward compatibility) / 使用されません(互換性のため残してあります)")
216+
parser.add_argument(
217+
"--v2", action="store_true", help="not used (for backward compatibility) / 使用されません(互換性のため残してあります)"
218+
)
212219
parser.add_argument("--batch_size", type=int, default=1, help="batch size in inference / 推論時のバッチサイズ")
213220
parser.add_argument(
214221
"--max_data_loader_n_workers",
@@ -231,18 +238,32 @@ def setup_parser() -> argparse.ArgumentParser:
231238
help="steps of resolution for buckets, divisible by 8 is recommended / bucketの解像度の単位、8で割り切れる値を推奨します",
232239
)
233240
parser.add_argument(
234-
"--bucket_no_upscale", action="store_true", help="make bucket for each image without upscaling / 画像を拡大せずbucketを作成します"
241+
"--bucket_no_upscale",
242+
action="store_true",
243+
help="make bucket for each image without upscaling / 画像を拡大せずbucketを作成します",
235244
)
236245
parser.add_argument(
237-
"--mixed_precision", type=str, default="no", choices=["no", "fp16", "bf16"], help="use mixed precision / 混合精度を使う場合、その精度"
246+
"--mixed_precision",
247+
type=str,
248+
default="no",
249+
choices=["no", "fp16", "bf16"],
250+
help="use mixed precision / 混合精度を使う場合、その精度",
238251
)
239252
parser.add_argument(
240253
"--full_path",
241254
action="store_true",
242255
help="use full path as image-key in metadata (supports multiple directories) / メタデータで画像キーをフルパスにする(複数の学習画像ディレクトリに対応)",
243256
)
244257
parser.add_argument(
245-
"--flip_aug", action="store_true", help="flip augmentation, save latents for flipped images / 左右反転した画像もlatentを取得、保存する"
258+
"--flip_aug",
259+
action="store_true",
260+
help="flip augmentation, save latents for flipped images / 左右反転した画像もlatentを取得、保存する",
261+
)
262+
parser.add_argument(
263+
"--alpha_mask",
264+
type=str,
265+
default="",
266+
help="save alpha mask for images for loss calculation / 損失計算用に画像のアルファマスクを保存する",
246267
)
247268
parser.add_argument(
248269
"--skip_existing",

finetune/tag_images_by_wd14_tagger.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,6 @@ def main(args):
112112

113113
# モデルを読み込む
114114
if args.onnx:
115-
import torch
116115
import onnx
117116
import onnxruntime as ort
118117

0 commit comments

Comments
 (0)