Use torchvision's native LANCZOS interpolation instead of PIL fallback by NicolasHug · Pull Request #46496 · huggingface/transformers

NicolasHug · 2026-06-08T11:13:46Z

What does this PR do?

This is sort of a follow-up to #45195, and the proposed changes were discussed on the PyTorch slack channel with @yonigozlan .

Torchvision 0.27 added native support for LANCZOS interpolation. Previously, transformers needed two workarounds:

the torchvision backend had to fall back to BICUBIC when LANCZOS was requested
some entire model pipelines had to fallback to PIL.

This PR removes both workarounds when torchvision >= 0.27 is installed. TorchVision is faster than PIL because it natively supports AVX2 and NEON paths for x86 and aarch64, and outputs are bitwise equivalent to PIL-SIMD.

Benchmarks

~/dev/transformers (use_torchvision_lanczos*) » python benchmark_lanczos.py                                                                                                                                                                nicolashug@nicolashug-fedora-PW0H326Y
Scenario                                        PIL (ms)    TV (ms)    Speedup
------------------------------------------------------------------------------
Chameleon  (800x600 -> 682x512)                     6.63       1.88       3.5x
Idefics3   (1920x1080 -> 819x1456)                 26.40      11.02       2.4x
Flava CB   (224x224 -> 112x112)                     0.54       0.18       3.1x

Benchmark code:

Details

"""Benchmark: torchvision LANCZOS resize vs PIL LANCZOS resize.

Requires torchvision >= 0.27 (which added LANCZOS support for tensors).
"""

import time

import PIL.Image
import numpy as np
import torch
from torchvision.transforms.v2.functional import resize as tv_resize
from torchvision.transforms import InterpolationMode


def bench_pil(pil_image, size, n=100):
    # Warmup
    for _ in range(10):
        pil_image.resize(size, PIL.Image.LANCZOS)

    t0 = time.perf_counter()
    for _ in range(n):
        pil_image.resize(size, PIL.Image.LANCZOS)
    return (time.perf_counter() - t0) / n * 1000  # ms


def bench_tv(tensor, size, n=100):
    # Warmup
    for _ in range(10):
        tv_resize(tensor, size, interpolation=InterpolationMode.LANCZOS, antialias=True)

    t0 = time.perf_counter()
    for _ in range(n):
        tv_resize(tensor, size, interpolation=InterpolationMode.LANCZOS, antialias=True)
    return (time.perf_counter() - t0) / n * 1000  # ms


# Typical sizes from the 4 affected models:
#   Chameleon:     shortest_edge=512 (we use 800x600 -> 682x512)
#   Idefics3/SmolVLM: longest_edge=1456 (we use 1920x1080 -> 1456x819)
#   Flava codebook:   112x112 (we use 224x224 -> 112x112)
scenarios = [
    ("Chameleon  (800x600 -> 682x512)", (800, 600), (512, 682)),
    ("Idefics3   (1920x1080 -> 819x1456)", (1920, 1080), (819, 1456)),
    ("Flava CB   (224x224 -> 112x112)", (224, 224), (112, 112)),
]

print(f"{'Scenario':<45} {'PIL (ms)':>10} {'TV (ms)':>10} {'Speedup':>10}")
print("-" * 78)

for label, (in_w, in_h), (out_h, out_w) in scenarios:
    rng = np.random.RandomState(0)
    arr = rng.randint(0, 256, (in_h, in_w, 3), dtype=np.uint8)

    pil_img = PIL.Image.fromarray(arr)
    tensor = torch.from_numpy(arr).permute(2, 0, 1).contiguous()  # CHW uint8

    pil_ms = bench_pil(pil_img, (out_w, out_h))
    tv_ms = bench_tv(tensor, (out_h, out_w))

    print(f"{label:<45} {pil_ms:>10.2f} {tv_ms:>10.2f} {pil_ms / tv_ms:>9.1f}x")

Code Agent Policy

I confirm that this is not a pure code agent PR.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests? I think existing tests properly cover the codepaths already

Who can review?

@vasqu @zucchini-nlp since your reviewed #45195. Thanks!

vasqu

Just a few quick questions but would really like @molbap or @zucchini-nlp to take a look here

vasqu · 2026-06-08T15:31:21Z

+            if is_torchvision_greater_or_equal("0.27"):
+                self.assertEqual(type(image_processor).__name__, "FlavaImageProcessor")
+            else:
+                self.assertEqual(type(image_processor).__name__, "FlavaImageProcessorPil")


This makes the test dependent on your installed env, no? We should use patching or something similar to force the version we want and ensure we check both at once

yeah, let's mock the version so we can test both options

vasqu · 2026-06-08T15:48:46Z

-        self.codebook_resample = codebook_resample if codebook_resample is not None else PILImageResampling.BICUBIC
+        # LANCZOS resample is natively supported with torchvision >= 0.27.
+        # On older versions, the base class falls back to BICUBIC automatically.
+        self.codebook_resample = codebook_resample if codebook_resample is not None else PILImageResampling.LANCZOS


Do we even need this ternary then? I.e. we should correctly fallback in any case, no?

My understanding is that the ternary is unrelated to lanczos support: it exists to account for the fact that the codebook_resample may be None, which is the default. If we were to remove the ternary, we'd have to change the codebook_resample=None default to codebook_resample=PILImageResampling.LANCZOS. Let me know your preference?

And then we rego through the LANCZOS -> BICUBIC pipeline? I can see this to keep BC behavior then, don't have a strong opinion, we can keep it this way

vasqu · 2026-06-08T15:52:41Z

        img_rgb = (1 - alpha[:, :, np.newaxis]) * 255 + alpha[:, :, np.newaxis] * img_rgba[:, :, :3]
        return PIL.Image.fromarray(img_rgb.astype("uint8"), "RGB")

-    def resize(


Hmm, this wasn't needed in the first place?

correct, this method wasn't actually needed before. It was just resolving the resample parameter in the same way as it's done in the base class, and then calls super().resize().

I can put it back to minimize the diff?

No worries, just wanted to confirm if it was indeed that way. No need to rechange 🫡

I'm down with deleting unnecessary code

zucchini-nlp

Super happy to have it upstream in torchvision. LGTM, let's just make sure the test actually runs for both versions, our CI runners have usually the latest version installed

vasqu · 2026-06-08T16:28:11Z

Agree, fixing that one test is the important bit, the rest are nits. Feel free to merge afterwards @zucchini-nlp

github-actions · 2026-06-08T16:36:16Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, chameleon, flava

NicolasHug · 2026-06-08T16:39:04Z

+                _LANCZOS_IMAGE_PROCESSORS,
+            ):
+                image_processor = AutoImageProcessor.from_pretrained(tmpdirname)
+                self.assertEqual(type(image_processor).__name__, "FlavaImageProcessorPil")


@vasqu @zucchini-nlp thank you so much for the quick review! I added mocking above as requested, unfortunately I wasn't able to mock torchvision.__version__ itself because torchvision.__version__ is read at import time when the DEFAULT_TO_PIL_BACKEND_IMAGE_PROCESSORS is being defined. So by the time the mocking happens (i.e. after import time), DEFAULT_TO_PIL_BACKEND_IMAGE_PROCESSORS was already populated.

Seems good to me 👍 the goal is just to make sure that we mimick the behavior

github-actions · 2026-06-08T16:50:25Z

CI Dashboard: View test results in Grafana

huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test

MHRDYN7 · 2026-06-10T16:42:12Z

@NicolasHug is the torch implementation of lanczos interpolation an exact replica of PIL? I see some numerical differences, so wanted to confirm. Would be happy if you could guide me to the tests asserting this version with the PIL version

edit: never mind, I see the issue. The bitwise equivalence is for PIL-SIMD, not regular PIL library as you mentioned above. One side effect of this will be that there will be discrepancies with libraries like mediapy that use PIL (my case - a jax based based model + mediapy processor)

NicolasHug · 2026-06-11T09:42:17Z

@MHRDYN7 yes you'll have small discrepancies, but those aren't significant enough for models to notice. Models would notice the difference between Lanczos and Bicubic, or between antialias=True vs antialias=False, but not these.

Generally, we can't expect bitwise equal results for image preproc: even two compliant jpeg decoders may output slightly different pixel values (there are for example differences between libjpeg and libjpeg-turbo).

In this case for lanczos, the difference comes from the fact that intermediate value are stored as uint8 rather than float. This leads to small differences for lanczos and bicubic, less so for bilinear mode.

NicolasHug added 2 commits June 8, 2026 11:39

Use torchvision lanczos

3bb87ea

Trigger CI again?

7a77a38

NicolasHug changed the title ~~Perf: use torchvision's native LANCZOS interpolation~~ Use torchvision's native LANCZOS interpolation instead of PIL fallback Jun 8, 2026

NicolasHug marked this pull request as ready for review June 8, 2026 14:09

vasqu reviewed Jun 8, 2026

View reviewed changes

zucchini-nlp approved these changes Jun 8, 2026

View reviewed changes

Remove comment, add mock test

682b157

NicolasHug commented Jun 8, 2026

View reviewed changes

zucchini-nlp added this pull request to the merge queue Jun 8, 2026

Merged via the queue into huggingface:main with commit ff33170 Jun 8, 2026
30 checks passed

khushali9 pushed a commit to khushali9/transformers that referenced this pull request Jun 8, 2026

Use torchvision's native LANCZOS interpolation instead of PIL fallback (

a56fde7

huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test

stevhliu mentioned this pull request Jun 9, 2026

[docs] torchvision lanczos #46528

Merged

louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026

Use torchvision's native LANCZOS interpolation instead of PIL fallback (

4151114

huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test

louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026

Use torchvision's native LANCZOS interpolation instead of PIL fallback (

599cea6

huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test

Conversation

NicolasHug commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Benchmarks

Code Agent Policy

Before submitting

Who can review?

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

MHRDYN7 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NicolasHug commented Jun 8, 2026 •

edited

Loading

vasqu Jun 8, 2026 •

edited

Loading

MHRDYN7 commented Jun 10, 2026 •

edited

Loading