Skip to content

Inconsistent IPTC data retrieval: Failure with specific JPEG files  #7318

@ckanaar

Description

@ckanaar

What did you do?

I'm loading a .jpg image using Pillow in an attempt to extract the IPTC metadata:

from PIL import Image, IptcImagePlugin

im = Image.open("lib/filename_example_1.jpg")
iptc = IptcImagePlugin.getiptcinfo(im)

What did you expect to happen?

I expect the getiptcinfo() function to return a dictionary containing the image's IPTC data.

What actually happened?

Pillow can't extract the IPTC data from the image:

Traceback (most recent call last):
  File "/path/to/main.py", line 5, in <module>
    iptc_1 = IptcImagePlugin.getiptcinfo(im_1)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/PIL/IptcImagePlugin.py", line 226, in getiptcinfo
    im._open()
  File "/path/to/venv/lib/python3.11/site-packages/PIL/IptcImagePlugin.py", line 89, in _open
    tag, size = self.field()
                ^^^^^^^^^^^^
  File "/path/to/venv/lib/python3.11/site-packages/PIL/IptcImagePlugin.py", line 69, in field
    raise SyntaxError(msg)
SyntaxError: invalid IPTC/NAA file

What are your OS, Python and Pillow versions?

  • OS: WSL2 - Ubuntu 22.04.2 LTS
  • Python: 3.11.4
  • Pillow: 10.0.0 (also getting the error with 9.x)

Additional information.

Due to confidentiality reasons, I can not provide the image that I'm trying to process. However, I will provide the anonomised IPTC metadata of two images which cause the error (filename_example_1.jpg, filename_example_2.jpg) and one which doesn't cause an error (filename_example_3.jpg). This IPTC metadata was extracted using https://www.imgonline.com.ua/.

filename_example_1.jpg (erroneous)

  • Coded Character Set: UTF8
  • Application Record Version: 52791
  • By-line: Anonymous Name
  • Copyright Notice: Anonymous Name, Anonymous Address, Anonynous Country
  • Time created: 12:19:38+00:00

filename_example_2.jpg (erroneous)

  • Destination: channel509
  • Envelope Record Version: 4
  • Object Name: 07252122
  • Keywords: Anonymous
  • By-line: Anonymous
  • Country-Primary Location Code: Anonymous
  • Country-Primary Location Name: Anonymous
  • Original Transmission Reference: Anonymous
  • Copyright Notice: Anonymous
  • Caption-Abstract: Anonymous
  • Local Caption: 07252122
  • Writer-Editor: hs
  • Image Orientation: Landscape
  • Application Record Version: 4
  • IPTC Image Width: 3749
  • IPTC Image Height: 2806
  • News Photo Version: 4

filename_example_3.jpg (succesfully loaded by Pillow)

  • Unique Document ID: Anonymous
  • Object Name: ELN
  • Category: Sport
  • Keywords: Anonymous
  • Date Created: 2004:90:12
  • By-line: Anonymous
  • Credit: Anonymous
  • Source: Anonymous
  • Copyright Notice: Anonymous
  • Caption-Abstract: Anonymous
  • Writer-Editor: Anonymous

filename_example_3.jpg Pillow getiptcinfo() output:
{(2, 187): b'Anonymous, (2, 5): b'ELN', (2, 15): b'spo', (2, 25): b'Anonymous', (2, 55): b'Anonymous', (2, 80): b'Anonymous', (2, 110): b'Anonymous', (2, 115): b'Anonymous', (2, 116): b'Anonymous', (2, 120): b'Foto: Anonymous', (2, 122): b'Anonymous', (2, 203): b'Ja', (2, 205): b'Ja'}

I'm providing this metadata to show that to my understanding based on these examples, there doesn't seem to be a pattern related to when IPTC data is successfully loaded by Pillow and when not.

If anyone is able to provide some insights into this bug, why it is happening, and how it can be resolved, I would greatly appreciate it. Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions