Skip to content

Lazy loading images when creating a pdf #4067

@ghost

Description

What did you do?

I'm trying to create a pdf with a lot of images, I decided to lazy load them to avoid high memory usage so I built a generator that loads the images and yields them, here is the minimal reproducible example

import os
import sys
from PIL import Image
from random import choice
import os
import psutil

process = psutil.Process(os.getpid())
rotations = [i * i for i in range(12)]
names = ["pdf%i.pdf" % i for i in range(20)]

def image_generator(image_paths, do_break=False):
    print("running generator")
    curr_image = 0
    for img_p in image_paths:
        print("mem usage in bytes: %s " % str(process.memory_info().rss))
        with open(img_p, "rb") as fp:
            with Image.open(fp) as img:
                try:
                    img.verify()
                except Exception as e:
                    print("invalid image continuing")
                    print("Exception: %s" % str(e))
                    continue
            with open(img_p, "rb") as fp:
                with Image.open(fp) as img:
                    img.load()
                    yield img

def gather_images(img_dir):
    image_paths = []
    print("gathering images")
    for root, __, files in os.walk(img_dir, topdown=False):
        for file in files:
            source_path = os.path.join(root, file)
            image_paths.append(source_path)
    return image_paths

def run_main(imgs_dir):
    image_paths = gather_images(imgs_dir)
    if len(image_paths) < 3:
        print("not enough images")
    else:
        print("starting")
        first_img = None
        for img in image_generator(image_paths):
            first_img = img
            break
        image_paths.pop(0)
        print("Images to process: %i" % len(image_paths))
        name = choice(names)
        for name in names:
            if os.path.isfile(name):
                name = choice(name)
        if os.path.isfile(name):
            print("Unable to create name")
            raise FileExistsError(name)
        else:
            first_img.save(name, "PDF", resolution=90, save_all=True,
                           append_images=image_generator(image_paths))

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("need path argument (location of images)")
    elif not os.path.isdir(os.path.abspath(sys.argv[1])):
        print("Provided path does not exists")
    else:
        run_main(sys.argv[1])

What did you expect to happen?

I expected the memory usage to remain somewhat constant

What actually happened?

The memory usage kept rising until I had to forcefully close the program to avoid a system freeze
I have also tried to close the images after yielding them which did maintain the memory usage constant but that only raised another issue ValueError: Operation on closed image

full traceback:

Traceback (most recent call last):
  File "minimal_memory_example.py", line 70, in <module>
    run_main(sys.argv[1])
  File "minimal_memory_example.py", line 62, in run_main
    append_images=image_generator(image_paths))
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/Image.py", line 2088, in save
    save_handler(self, fp, filename)
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/PdfImagePlugin.py", line 45, in _save_all
    _save(im, fp, filename, save_all=True)
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/PdfImagePlugin.py", line 174, in _save
    Image.SAVE["JPEG"](im, op, filename)
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/JpegImagePlugin.py", line 779, in _save
    ImageFile._save(im, fp, [("jpeg", (0, 0) + im.size, 0, rawmode)], bufsize)
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/ImageFile.py", line 485, in _save
    im.load()
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/ImageFile.py", line 144, in load
    pixel = Image.Image.load(self)
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/Image.py", line 879, in load
    return self.im.pixel_access(self.readonly)
  File "/home/hallowf/Documents/Github/STPDF/venv/lib/python3.7/site-packages/PIL/_util.py", line 43, in __getattr__
    raise self.ex
ValueError: Operation on closed image

What are your OS, Python and Pillow versions?

  • OS: Ubuntu 18.04 LTS
  • Python: python 3.7
  • Pillow: 6.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions