-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
Description
While looking at the performance of font rendering, and investigating the possible use of the FreeType cache code - something caught my eye in ImagingFill2 - fill_mask_L - A loop-invariant is calculated in each loop.
https://github.com/python-pillow/Pillow/blob/main/src/libImaging/Paste.c#L441
When pasting into an RGBA buffer, this code appears to call strcmp() 5 times for each pixel.
Looking at callgrind I get:
3,338,238,474 (26.97%) ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2 [/lib/x86_64-linux-gnu/libc.so.6]
2,880,392,864 (23.27%) /tmp/pip-install-qq5i177l/pillow_285763a52f8c48c0885d3a94de619b1e/src/libImaging/Paste.c:ImagingFill2 [/home/richja/dev/gopro-graphics/venv/lib/python3.10/site-packages/PIL/_imaging.cpython-310-x86_64-linux-gnu.so]
(you don't see fill_mask_L in this output as is is inline )
Something like this might be more efficient:
int special_mode = (strcmp(imOut->mode, "RGBa") == 0 || strcmp(imOut->mode, "RGBA") == 0 || strcmp(imOut->mode, "La") == 0 || strcmp(imOut->mode, "LA") == 0 || strcmp(imOut->mode, "PA") == 0);
for (y = 0; y < ysize; y++) {
UINT8 *out = (UINT8 *)imOut->image[y + dy] + dx * pixelsize;
UINT8 *mask = (UINT8 *)imMask->image[y + sy] + sx;
for (x = 0; x < xsize; x++) {
for (i = 0; i < pixelsize; i++) {
UINT8 channel_mask = *mask;
if ( special_mode && i != 3 && channel_mask != 0) {
channel_mask = 255 - (255 - channel_mask) * (1 - (255 - out[3]) / 255);
}
out[i] = BLEND(channel_mask, out[i], ink[i], tmp1);
}
out += pixelsize;
mask++;
}
}