Use torchvision's native LANCZOS interpolation instead of PIL fallback#46496
Conversation
vasqu
left a comment
There was a problem hiding this comment.
Just a few quick questions but would really like @molbap or @zucchini-nlp to take a look here
| if is_torchvision_greater_or_equal("0.27"): | ||
| self.assertEqual(type(image_processor).__name__, "FlavaImageProcessor") | ||
| else: | ||
| self.assertEqual(type(image_processor).__name__, "FlavaImageProcessorPil") |
There was a problem hiding this comment.
This makes the test dependent on your installed env, no? We should use patching or something similar to force the version we want and ensure we check both at once
There was a problem hiding this comment.
yeah, let's mock the version so we can test both options
| self.codebook_resample = codebook_resample if codebook_resample is not None else PILImageResampling.BICUBIC | ||
| # LANCZOS resample is natively supported with torchvision >= 0.27. | ||
| # On older versions, the base class falls back to BICUBIC automatically. | ||
| self.codebook_resample = codebook_resample if codebook_resample is not None else PILImageResampling.LANCZOS |
There was a problem hiding this comment.
Do we even need this ternary then? I.e. we should correctly fallback in any case, no?
There was a problem hiding this comment.
My understanding is that the ternary is unrelated to lanczos support: it exists to account for the fact that the codebook_resample may be None, which is the default. If we were to remove the ternary, we'd have to change the codebook_resample=None default to codebook_resample=PILImageResampling.LANCZOS. Let me know your preference?
There was a problem hiding this comment.
And then we rego through the LANCZOS -> BICUBIC pipeline? I can see this to keep BC behavior then, don't have a strong opinion, we can keep it this way
| img_rgb = (1 - alpha[:, :, np.newaxis]) * 255 + alpha[:, :, np.newaxis] * img_rgba[:, :, :3] | ||
| return PIL.Image.fromarray(img_rgb.astype("uint8"), "RGB") | ||
|
|
||
| def resize( |
There was a problem hiding this comment.
Hmm, this wasn't needed in the first place?
There was a problem hiding this comment.
correct, this method wasn't actually needed before. It was just resolving the resample parameter in the same way as it's done in the base class, and then calls super().resize().
I can put it back to minimize the diff?
There was a problem hiding this comment.
No worries, just wanted to confirm if it was indeed that way. No need to rechange 🫡
There was a problem hiding this comment.
I'm down with deleting unnecessary code
zucchini-nlp
left a comment
There was a problem hiding this comment.
Super happy to have it upstream in torchvision. LGTM, let's just make sure the test actually runs for both versions, our CI runners have usually the latest version installed
|
Agree, fixing that one test is the important bit, the rest are nits. Feel free to merge afterwards @zucchini-nlp |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, chameleon, flava |
| _LANCZOS_IMAGE_PROCESSORS, | ||
| ): | ||
| image_processor = AutoImageProcessor.from_pretrained(tmpdirname) | ||
| self.assertEqual(type(image_processor).__name__, "FlavaImageProcessorPil") |
There was a problem hiding this comment.
@vasqu @zucchini-nlp thank you so much for the quick review! I added mocking above as requested, unfortunately I wasn't able to mock torchvision.__version__ itself because torchvision.__version__ is read at import time when the DEFAULT_TO_PIL_BACKEND_IMAGE_PROCESSORS is being defined. So by the time the mocking happens (i.e. after import time), DEFAULT_TO_PIL_BACKEND_IMAGE_PROCESSORS was already populated.
There was a problem hiding this comment.
Seems good to me 👍 the goal is just to make sure that we mimick the behavior
|
CI Dashboard: View test results in Grafana |
huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test
huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test
huggingface#46496) * Use torchvision lanczos * Trigger CI again? * Remove comment, add mock test
|
@NicolasHug is the torch implementation of lanczos interpolation an exact replica of PIL? I see some numerical differences, so wanted to confirm. Would be happy if you could guide me to the tests asserting this version with the PIL version edit: never mind, I see the issue. The bitwise equivalence is for PIL-SIMD, not regular PIL library as you mentioned above. One side effect of this will be that there will be discrepancies with libraries like mediapy that use PIL (my case - a jax based based model + mediapy processor) |
|
@MHRDYN7 yes you'll have small discrepancies, but those aren't significant enough for models to notice. Models would notice the difference between Lanczos and Bicubic, or between antialias=True vs antialias=False, but not these. Generally, we can't expect bitwise equal results for image preproc: even two compliant jpeg decoders may output slightly different pixel values (there are for example differences between libjpeg and libjpeg-turbo). In this case for lanczos, the difference comes from the fact that intermediate value are stored as uint8 rather than float. This leads to small differences for lanczos and bicubic, less so for bilinear mode. |
What does this PR do?
This is sort of a follow-up to #45195, and the proposed changes were discussed on the PyTorch slack channel with @yonigozlan .
Torchvision 0.27 added native support for LANCZOS interpolation. Previously,
transformersneeded two workarounds:This PR removes both workarounds when torchvision >= 0.27 is installed. TorchVision is faster than PIL because it natively supports AVX2 and NEON paths for x86 and aarch64, and outputs are bitwise equivalent to PIL-SIMD.
Benchmarks
Benchmark code:
Details
Code Agent Policy
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@vasqu @zucchini-nlp since your reviewed #45195. Thanks!