Skip to content

Use torchvision's native LANCZOS interpolation instead of PIL fallback#46496

Merged
zucchini-nlp merged 3 commits into
huggingface:mainfrom
NicolasHug:use_torchvision_lanczos
Jun 8, 2026
Merged

Use torchvision's native LANCZOS interpolation instead of PIL fallback#46496
zucchini-nlp merged 3 commits into
huggingface:mainfrom
NicolasHug:use_torchvision_lanczos

Conversation

@NicolasHug

@NicolasHug NicolasHug commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

This is sort of a follow-up to #45195, and the proposed changes were discussed on the PyTorch slack channel with @yonigozlan .

Torchvision 0.27 added native support for LANCZOS interpolation. Previously, transformers needed two workarounds:

  • the torchvision backend had to fall back to BICUBIC when LANCZOS was requested
  • some entire model pipelines had to fallback to PIL.

This PR removes both workarounds when torchvision >= 0.27 is installed. TorchVision is faster than PIL because it natively supports AVX2 and NEON paths for x86 and aarch64, and outputs are bitwise equivalent to PIL-SIMD.

Benchmarks

~/dev/transformers (use_torchvision_lanczos*) » python benchmark_lanczos.py                                                                                                                                                                nicolashug@nicolashug-fedora-PW0H326Y
Scenario                                        PIL (ms)    TV (ms)    Speedup
------------------------------------------------------------------------------
Chameleon  (800x600 -> 682x512)                     6.63       1.88       3.5x
Idefics3   (1920x1080 -> 819x1456)                 26.40      11.02       2.4x
Flava CB   (224x224 -> 112x112)                     0.54       0.18       3.1x

Benchmark code:

Details
"""Benchmark: torchvision LANCZOS resize vs PIL LANCZOS resize.

Requires torchvision >= 0.27 (which added LANCZOS support for tensors).
"""

import time

import PIL.Image
import numpy as np
import torch
from torchvision.transforms.v2.functional import resize as tv_resize
from torchvision.transforms import InterpolationMode


def bench_pil(pil_image, size, n=100):
    # Warmup
    for _ in range(10):
        pil_image.resize(size, PIL.Image.LANCZOS)

    t0 = time.perf_counter()
    for _ in range(n):
        pil_image.resize(size, PIL.Image.LANCZOS)
    return (time.perf_counter() - t0) / n * 1000  # ms


def bench_tv(tensor, size, n=100):
    # Warmup
    for _ in range(10):
        tv_resize(tensor, size, interpolation=InterpolationMode.LANCZOS, antialias=True)

    t0 = time.perf_counter()
    for _ in range(n):
        tv_resize(tensor, size, interpolation=InterpolationMode.LANCZOS, antialias=True)
    return (time.perf_counter() - t0) / n * 1000  # ms


# Typical sizes from the 4 affected models:
#   Chameleon:     shortest_edge=512 (we use 800x600 -> 682x512)
#   Idefics3/SmolVLM: longest_edge=1456 (we use 1920x1080 -> 1456x819)
#   Flava codebook:   112x112 (we use 224x224 -> 112x112)
scenarios = [
    ("Chameleon  (800x600 -> 682x512)", (800, 600), (512, 682)),
    ("Idefics3   (1920x1080 -> 819x1456)", (1920, 1080), (819, 1456)),
    ("Flava CB   (224x224 -> 112x112)", (224, 224), (112, 112)),
]

print(f"{'Scenario':<45} {'PIL (ms)':>10} {'TV (ms)':>10} {'Speedup':>10}")
print("-" * 78)

for label, (in_w, in_h), (out_h, out_w) in scenarios:
    rng = np.random.RandomState(0)
    arr = rng.randint(0, 256, (in_h, in_w, 3), dtype=np.uint8)

    pil_img = PIL.Image.fromarray(arr)
    tensor = torch.from_numpy(arr).permute(2, 0, 1).contiguous()  # CHW uint8

    pil_ms = bench_pil(pil_img, (out_w, out_h))
    tv_ms = bench_tv(tensor, (out_h, out_w))

    print(f"{label:<45} {pil_ms:>10.2f} {tv_ms:>10.2f} {pil_ms / tv_ms:>9.1f}x")

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

Who can review?

@vasqu @zucchini-nlp since your reviewed #45195. Thanks!

@NicolasHug NicolasHug changed the title Perf: use torchvision's native LANCZOS interpolation Use torchvision's native LANCZOS interpolation instead of PIL fallback Jun 8, 2026
@NicolasHug NicolasHug marked this pull request as ready for review June 8, 2026 14:09

@vasqu vasqu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few quick questions but would really like @molbap or @zucchini-nlp to take a look here

Comment on lines +315 to +318
if is_torchvision_greater_or_equal("0.27"):
self.assertEqual(type(image_processor).__name__, "FlavaImageProcessor")
else:
self.assertEqual(type(image_processor).__name__, "FlavaImageProcessorPil")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the test dependent on your installed env, no? We should use patching or something similar to force the version we want and ensure we check both at once

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's mock the version so we can test both options

self.codebook_resample = codebook_resample if codebook_resample is not None else PILImageResampling.BICUBIC
# LANCZOS resample is natively supported with torchvision >= 0.27.
# On older versions, the base class falls back to BICUBIC automatically.
self.codebook_resample = codebook_resample if codebook_resample is not None else PILImageResampling.LANCZOS

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this ternary then? I.e. we should correctly fallback in any case, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the ternary is unrelated to lanczos support: it exists to account for the fact that the codebook_resample may be None, which is the default. If we were to remove the ternary, we'd have to change the codebook_resample=None default to codebook_resample=PILImageResampling.LANCZOS. Let me know your preference?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then we rego through the LANCZOS -> BICUBIC pipeline? I can see this to keep BC behavior then, don't have a strong opinion, we can keep it this way

Comment thread src/transformers/models/flava/image_processing_flava.py
img_rgb = (1 - alpha[:, :, np.newaxis]) * 255 + alpha[:, :, np.newaxis] * img_rgba[:, :, :3]
return PIL.Image.fromarray(img_rgb.astype("uint8"), "RGB")

def resize(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this wasn't needed in the first place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, this method wasn't actually needed before. It was just resolving the resample parameter in the same way as it's done in the base class, and then calls super().resize().

I can put it back to minimize the diff?

@vasqu vasqu Jun 8, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, just wanted to confirm if it was indeed that way. No need to rechange 🫡

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down with deleting unnecessary code

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super happy to have it upstream in torchvision. LGTM, let's just make sure the test actually runs for both versions, our CI runners have usually the latest version installed

@vasqu

vasqu commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Agree, fixing that one test is the important bit, the rest are nits. Feel free to merge afterwards @zucchini-nlp

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, chameleon, flava

_LANCZOS_IMAGE_PROCESSORS,
):
image_processor = AutoImageProcessor.from_pretrained(tmpdirname)
self.assertEqual(type(image_processor).__name__, "FlavaImageProcessorPil")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasqu @zucchini-nlp thank you so much for the quick review! I added mocking above as requested, unfortunately I wasn't able to mock torchvision.__version__ itself because torchvision.__version__ is read at import time when the DEFAULT_TO_PIL_BACKEND_IMAGE_PROCESSORS is being defined. So by the time the mocking happens (i.e. after import time), DEFAULT_TO_PIL_BACKEND_IMAGE_PROCESSORS was already populated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me 👍 the goal is just to make sure that we mimick the behavior

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

@zucchini-nlp zucchini-nlp added this pull request to the merge queue Jun 8, 2026
Merged via the queue into huggingface:main with commit ff33170 Jun 8, 2026
30 checks passed
khushali9 pushed a commit to khushali9/transformers that referenced this pull request Jun 8, 2026
huggingface#46496)

* Use torchvision lanczos

* Trigger CI again?

* Remove comment, add mock test
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026
huggingface#46496)

* Use torchvision lanczos

* Trigger CI again?

* Remove comment, add mock test
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026
huggingface#46496)

* Use torchvision lanczos

* Trigger CI again?

* Remove comment, add mock test
@MHRDYN7

MHRDYN7 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@NicolasHug is the torch implementation of lanczos interpolation an exact replica of PIL? I see some numerical differences, so wanted to confirm. Would be happy if you could guide me to the tests asserting this version with the PIL version

edit: never mind, I see the issue. The bitwise equivalence is for PIL-SIMD, not regular PIL library as you mentioned above. One side effect of this will be that there will be discrepancies with libraries like mediapy that use PIL (my case - a jax based based model + mediapy processor)

@NicolasHug

Copy link
Copy Markdown
Contributor Author

@MHRDYN7 yes you'll have small discrepancies, but those aren't significant enough for models to notice. Models would notice the difference between Lanczos and Bicubic, or between antialias=True vs antialias=False, but not these.

Generally, we can't expect bitwise equal results for image preproc: even two compliant jpeg decoders may output slightly different pixel values (there are for example differences between libjpeg and libjpeg-turbo).

In this case for lanczos, the difference comes from the fact that intermediate value are stored as uint8 rather than float. This leads to small differences for lanczos and bicubic, less so for bilinear mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants