-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Currently, the load_image
function in train_utils
uses the PIL image.convert("RGB")
method for converting the loaded 4 RGBA channel images to the expected 3 channel RGB format.
This is incorrect for some edge cases, and causes the background to become distorted.
As an example, loading this RGBA image results in heavy background artifacts. when used with the test code below:
from PIL import Image
from library.train_util import load_image
img = load_image("test.png")
Image.fromarray(img, 'RGB').save("test_convert.png")
To verify this issue, a LoRA was created using that single image as the dataset, using a caption that includes the words "white background". Seed and everything else was fixed between the two tests.
The top row is the latest code using a fresh VENV. The bottom row is the proposed fix, which I will create a PR for shortly. It uses the Image.alpha_composite
function with a new blank white image to handle the alpha channel.