-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What did you do?
I'm loading a .jpg image using Pillow in an attempt to extract the IPTC metadata:
from PIL import Image, IptcImagePlugin
im = Image.open("lib/filename_example_1.jpg")
iptc = IptcImagePlugin.getiptcinfo(im)What did you expect to happen?
I expect the getiptcinfo() function to return a dictionary containing the image's IPTC data.
What actually happened?
Pillow can't extract the IPTC data from the image:
Traceback (most recent call last):
File "/path/to/main.py", line 5, in <module>
iptc_1 = IptcImagePlugin.getiptcinfo(im_1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/venv/lib/python3.11/site-packages/PIL/IptcImagePlugin.py", line 226, in getiptcinfo
im._open()
File "/path/to/venv/lib/python3.11/site-packages/PIL/IptcImagePlugin.py", line 89, in _open
tag, size = self.field()
^^^^^^^^^^^^
File "/path/to/venv/lib/python3.11/site-packages/PIL/IptcImagePlugin.py", line 69, in field
raise SyntaxError(msg)
SyntaxError: invalid IPTC/NAA file
What are your OS, Python and Pillow versions?
- OS: WSL2 - Ubuntu 22.04.2 LTS
- Python: 3.11.4
- Pillow: 10.0.0 (also getting the error with 9.x)
Additional information.
Due to confidentiality reasons, I can not provide the image that I'm trying to process. However, I will provide the anonomised IPTC metadata of two images which cause the error (filename_example_1.jpg, filename_example_2.jpg) and one which doesn't cause an error (filename_example_3.jpg). This IPTC metadata was extracted using https://www.imgonline.com.ua/.
filename_example_1.jpg (erroneous)
- Coded Character Set: UTF8
- Application Record Version: 52791
- By-line: Anonymous Name
- Copyright Notice: Anonymous Name, Anonymous Address, Anonynous Country
- Time created: 12:19:38+00:00
filename_example_2.jpg (erroneous)
- Destination: channel509
- Envelope Record Version: 4
- Object Name: 07252122
- Keywords: Anonymous
- By-line: Anonymous
- Country-Primary Location Code: Anonymous
- Country-Primary Location Name: Anonymous
- Original Transmission Reference: Anonymous
- Copyright Notice: Anonymous
- Caption-Abstract: Anonymous
- Local Caption: 07252122
- Writer-Editor: hs
- Image Orientation: Landscape
- Application Record Version: 4
- IPTC Image Width: 3749
- IPTC Image Height: 2806
- News Photo Version: 4
filename_example_3.jpg (succesfully loaded by Pillow)
- Unique Document ID: Anonymous
- Object Name: ELN
- Category: Sport
- Keywords: Anonymous
- Date Created: 2004:90:12
- By-line: Anonymous
- Credit: Anonymous
- Source: Anonymous
- Copyright Notice: Anonymous
- Caption-Abstract: Anonymous
- Writer-Editor: Anonymous
filename_example_3.jpg Pillow getiptcinfo() output:
{(2, 187): b'Anonymous, (2, 5): b'ELN', (2, 15): b'spo', (2, 25): b'Anonymous', (2, 55): b'Anonymous', (2, 80): b'Anonymous', (2, 110): b'Anonymous', (2, 115): b'Anonymous', (2, 116): b'Anonymous', (2, 120): b'Foto: Anonymous', (2, 122): b'Anonymous', (2, 203): b'Ja', (2, 205): b'Ja'}
I'm providing this metadata to show that to my understanding based on these examples, there doesn't seem to be a pattern related to when IPTC data is successfully loaded by Pillow and when not.
If anyone is able to provide some insights into this bug, why it is happening, and how it can be resolved, I would greatly appreciate it. Cheers!