Skip to content

Converting JPEG2000 from bytestream to numpy array not functioning as expected #3152

@stuartspotlight

Description

@stuartspotlight

What did you do?

I extracted a JPEG 2000 from a pdf as bytes. I then loaded the result into Pillow using

im = Image.open(BytesIO(raw))

Next I attempted to convert to a numpy array in order to manipulate the data. Using

A = np.array(im)

This resulted in the array

array(<PIL.Jpeg2KImagePlugin.Jpeg2KImageFile image mode=RGBA size=1598x1598 at 0x7FDEAF719A20>,
      dtype=object)

When I attempted to force numpy to convert this array to a sequence of numbers the result was the error:-

Traceback (most recent call last):

  File "<ipython-input-39-042ef64a6b36>", line 1, in <module>
    runfile('/home/stuart/Documents/Python_programs/embedded_image_processing/classify_embeded_images/test_embeded_image_extractor_locally.py', wdir='/home/stuart/Documents/Python_programs/embedded_image_processing/classify_embeded_images')

  File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "/home/stuart/Documents/Python_programs/embedded_image_processing/classify_embeded_images/test_embeded_image_extractor_locally.py", line 402, in <module>
    A = np.array(im.getdata())

  File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 1220, in getdata
    self.load()

  File "/usr/local/lib/python3.5/dist-packages/PIL/Jpeg2KImagePlugin.py", line 210, in load
    return ImageFile.ImageFile.load(self)

  File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 250, in load
    raise_ioerror(err_code)

  File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 59, in raise_ioerror
    raise IOError(message + " when reading image file")

OSError: broken data stream when reading image file

This thread had a similar issue with loading jpeg2000 files but I don't understand their resolution #1510

What did you expect to happen?

I expected the image to be converted into a numpy array.

What actually happened?

The system would either error or create an array containing the image object

What versions of Pillow and Python are you using?

python 3.5/3.6 (3.6 when running inside a docker container)
Pillow==5.1.0
numpy==1.14.1

I am using pdfminer.six to extract the image on the first page of this document as a test:-

https://hartley-botanic.co.uk/wp-content/uploads/2017/07/Hartley-guide-greenhouse-gardening.pdf

The really odd issue is that when I try convert using the console using the exact same command,

a = np.array(im)

I get a numpy array as expected

array([[[140, 118, 219,  82],
        [145, 114, 210,  84],
        [147, 111, 195,  86],
        ...,
        [ 27,  45,  62,   0],
        [ 27,  45,  62,   0],
        [ 27,  45,  62,   0]],

       [[149, 112, 213,  84],
        [143, 115, 206,  84],
        [133, 119, 193,  83],
        ...,
        [ 27,  45,  62,   0],
        [ 27,  45,  62,   0],
        [ 27,  45,  62,   0]],

       [[155, 106, 203,  86],
        [138, 116, 198,  83],
        [119, 129, 188,  77],
        ...,
        [ 27,  45,  62,   0],
        [ 27,  45,  62,   0],
        [ 27,  45,  62,   0]],

       ...,

       [[136,  94,  86,   7],
        [135,  94,  85,   7],
        [135,  94,  85,   7],
        ...,
        [158, 124, 100,  28],
        [158, 124, 100,  28],
        [158, 124, 100,  28]],

       [[136,  94,  86,   7],
        [135,  94,  85,   7],
        [135,  94,  85,   7],
        ...,
        [158, 124, 100,  28],
        [158, 124, 100,  28],
        [158, 124, 100,  28]],

       [[136,  94,  86,   7],
        [135,  94,  85,   7],
        [135,  94,  85,   7],
        ...,
        [158, 124, 100,  28],
        [158, 124, 100,  28],
        [158, 124, 100,  28]]], dtype=uint8)

Can anyone think of a reason for this discrepancy that would allow me to perform the conversion while running my code?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions