Skip to content

Performance difference between the conda and pip version in io.read_image #6782

Open
@Leon5x

Description

@Leon5x

🐛 Describe the bug

There is a big performance difference in reading jpg images using the conda or pip version of torchvision using the function torchvision.io.read_image.
When benchmarking reading 1000 images from a folder the pip version is more than 2x faster than the version installed from conda!
For the test I created 2 new conda environments using
conda create --name tvpip python=3.10
In one environment I installed torchvision using conda:
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
and in the other using pip:
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113

Then I used the following code to benchmark torchvision.io.read_image, Pillow and accimage:

import os, torchvision
from time import time as t

f = "test"
files = [file for file in os.listdir(f)]
test_images = len(files)

def test(files, fct):
    s = t()
    for file in files:
        image = fct(os.path.join(f,file))
    return t()-s

torchvision.set_image_backend("PIL")
time_needed = test(files, torchvision.io.read_image)
print(f"Torchvision {torchvision.get_image_backend():13s} Loading {test_images} files took {time_needed:.1f}s")

torchvision.set_image_backend("accimage")
time_needed = test(files, torchvision.io.read_image)
print(f"Torchvision {torchvision.get_image_backend():13s} Loading {test_images} files took {time_needed:.1f}s")

from PIL import Image
s = t()
for file in files:
    image = Image.open(os.path.join(f,file)).convert("RGB")
time_needed = t() - s
print(f"{'Pillow':25s} Loading {test_images} files took {time_needed:.1f}s")

import accimage
time_needed = test(files, accimage.Image)
print(f"{'AccImage':25s} Loading {test_images} files took {time_needed:.1f}s")

Findings:

  • In the conda environment the torchvision.io.read_image takes 4.6s, in the pip environment it takes 1.9s, Should be the same. I couln't figure out where the speed difference comes from, from the timings it looks like pip is using pillow-simd or libjpeg-turbo somehow.
  • When using the accimage backend with torchvision (torchvision.set_image_backend) the time to load the images doesn't change at all. Which seems like the same bacend is used. That behavior is the same in the pip and conda environment.
  • Installing pillow-simd and accimage in the environment before installing torchvision doesn't change anything apart from the pillow time.
  • When installing accimage in the conda environment, the time for torchvision.io.read_image with the accimage backend doesn't change, which in my understanding it should.

I hope you can reproduce the behavior or give some insights why this might be the case. Thanks already.

Versions

Environment pip

Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.10.6 (main, Oct 7 2022, 20:19:58) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-50-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 515.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==1.12.1+cu113
[pip3] torchvision==0.13.1+cu113
[conda] numpy 1.23.4 pypi_0 pypi
[conda] torch 1.12.1+cu113 pypi_0 pypi
[conda] torchvision 0.13.1+cu113 pypi_0 pypi

Environment conda

Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.10.6 (main, Oct 7 2022, 20:19:58) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-50-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 515.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.12.1
[pip3] torchvision==0.13.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py310h7f8727e_0
[conda] mkl_fft 1.3.1 py310hd6ae3a3_0
[conda] mkl_random 1.2.2 py310h00e6091_0
[conda] numpy 1.23.1 py310h1794996_0
[conda] numpy-base 1.23.1 py310hcba007f_0
[conda] pytorch 1.12.1 py3.10_cuda11.3_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchvision 0.13.1 py310_cu113 pytorch

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions