-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What did you do?
I extracted a JPEG 2000 from a pdf as bytes. I then loaded the result into Pillow using
im = Image.open(BytesIO(raw))Next I attempted to convert to a numpy array in order to manipulate the data. Using
A = np.array(im)This resulted in the array
array(<PIL.Jpeg2KImagePlugin.Jpeg2KImageFile image mode=RGBA size=1598x1598 at 0x7FDEAF719A20>,
dtype=object)
When I attempted to force numpy to convert this array to a sequence of numbers the result was the error:-
Traceback (most recent call last):
File "<ipython-input-39-042ef64a6b36>", line 1, in <module>
runfile('/home/stuart/Documents/Python_programs/embedded_image_processing/classify_embeded_images/test_embeded_image_extractor_locally.py', wdir='/home/stuart/Documents/Python_programs/embedded_image_processing/classify_embeded_images')
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "/home/stuart/Documents/Python_programs/embedded_image_processing/classify_embeded_images/test_embeded_image_extractor_locally.py", line 402, in <module>
A = np.array(im.getdata())
File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 1220, in getdata
self.load()
File "/usr/local/lib/python3.5/dist-packages/PIL/Jpeg2KImagePlugin.py", line 210, in load
return ImageFile.ImageFile.load(self)
File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 250, in load
raise_ioerror(err_code)
File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 59, in raise_ioerror
raise IOError(message + " when reading image file")
OSError: broken data stream when reading image file
This thread had a similar issue with loading jpeg2000 files but I don't understand their resolution #1510
What did you expect to happen?
I expected the image to be converted into a numpy array.
What actually happened?
The system would either error or create an array containing the image object
What versions of Pillow and Python are you using?
python 3.5/3.6 (3.6 when running inside a docker container)
Pillow==5.1.0
numpy==1.14.1
I am using pdfminer.six to extract the image on the first page of this document as a test:-
https://hartley-botanic.co.uk/wp-content/uploads/2017/07/Hartley-guide-greenhouse-gardening.pdf
The really odd issue is that when I try convert using the console using the exact same command,
a = np.array(im)I get a numpy array as expected
array([[[140, 118, 219, 82],
[145, 114, 210, 84],
[147, 111, 195, 86],
...,
[ 27, 45, 62, 0],
[ 27, 45, 62, 0],
[ 27, 45, 62, 0]],
[[149, 112, 213, 84],
[143, 115, 206, 84],
[133, 119, 193, 83],
...,
[ 27, 45, 62, 0],
[ 27, 45, 62, 0],
[ 27, 45, 62, 0]],
[[155, 106, 203, 86],
[138, 116, 198, 83],
[119, 129, 188, 77],
...,
[ 27, 45, 62, 0],
[ 27, 45, 62, 0],
[ 27, 45, 62, 0]],
...,
[[136, 94, 86, 7],
[135, 94, 85, 7],
[135, 94, 85, 7],
...,
[158, 124, 100, 28],
[158, 124, 100, 28],
[158, 124, 100, 28]],
[[136, 94, 86, 7],
[135, 94, 85, 7],
[135, 94, 85, 7],
...,
[158, 124, 100, 28],
[158, 124, 100, 28],
[158, 124, 100, 28]],
[[136, 94, 86, 7],
[135, 94, 85, 7],
[135, 94, 85, 7],
...,
[158, 124, 100, 28],
[158, 124, 100, 28],
[158, 124, 100, 28]]], dtype=uint8)Can anyone think of a reason for this discrepancy that would allow me to perform the conversion while running my code?
Thanks