Image object not recognized #3336
-
I'm extracting images from scientific papers. For this PDF I'm having troubles to extract Fig. 3 on page 10 - this image object is not included in I have the same issue in PyMuPDF, see PyMuPDF#4577. Environment$ python -m platform
Linux-6.12.32-amd64-x86_64-with-glibc2.41
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.6.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=11.2.1 Python version: 3.13.3 Code + PDFExtracts all images from the given document: from pypdf import PdfReader
with PdfReader("s44372-024-00085-0.pdf") as reader:
for page_number, page in enumerate(reader.pages, start=1):
for image in page.images:
with open(f"pypdf-{page_number}-{image.name}", "wb") as fp:
fp.write(image.data) The PDF in question can be found here. I am not the author of this document. It is published under CC-BY 4.0 and the license terms are included in the document. This license is not viral, so I think it's legal to include it into your test dataset. TracebackNo traceback |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
This specific page only references one actual image. Figure 3 is included with plain drawing commands. To extract it, you would have to render the page as an image, but this is out of scope for pypdf. |
Beta Was this translation helpful? Give feedback.
This specific page only references one actual image. Figure 3 is included with plain drawing commands. To extract it, you would have to render the page as an image, but this is out of scope for pypdf.