Skip to content

Bound image-processor resize/crop output to prevent unbounded allocation#46486

Closed
LinZiyuu wants to merge 1 commit into
huggingface:mainfrom
LinZiyuu:fix-image-processor-resize-unbounded-alloc
Closed

Bound image-processor resize/crop output to prevent unbounded allocation#46486
LinZiyuu wants to merge 1 commit into
huggingface:mainfrom
LinZiyuu:fix-image-processor-resize-unbounded-alloc

Conversation

@LinZiyuu

@LinZiyuu LinZiyuu commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Image processors resize and center-crop every input image to the size/crop_size read from preprocessor_config.json, with no upper bound. image_transforms.resize does image.resize((width, height)) and center_crop pads/crops to (crop_height, crop_width), where those targets come straight from config. A model repo whose preprocessor_config.json sets an oversized value (e.g. size=15000 or crop_size=50000) therefore makes any processor(image) call materialize a huge (height × width × channels) array — for any input image, a 32×32 one suffices — exhausting memory and OOM-killing the worker.

I verified this against the real library: CLIPImageProcessor(size={"shortest_edge":15000}, crop_size=15000)(Image.new("RGB",(32,32))) produces a (1,3,15000,15000) = 2.70 GB array (≈12 s), and under a hard 128 MB limit size=15000 is OOM-killed. This is the same allocation class as the mel/chroma filter-bank guard and the positional-embedding guards.

Add a shared output-size guard validate_image_output_size in image_transforms and call it before the allocation in resize, center_crop, and the Torchvision backend resize (the PIL backend resizes through image_transforms.resize, so it is covered too). The limit (1 << 27 pixels) is far above any real model input. Adds a regression test.

AI assistance was used while drafting this change; I reviewed every line and ran the tests below.

What does this PR do?

Bounds the image-processor resize/center-crop output so an oversized size/crop_size in an untrusted config cannot OOM the process at preprocess time.

Tests run (torch==2.12.0):

pytest tests/test_image_transforms.py -k "resize or center_crop"   # 4 passed
python utils/check_copies.py    # clean
ruff check / ruff format --check # clean
# benign still works: CLIPImageProcessor(size=224)(Image.new("RGB",(32,32)))

Fixes # (security hardening; reported privately rather than via a public issue to avoid disclosing the vector)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Image processors resize and center-crop every input image to the size read from
preprocessor_config.json (size / crop_size), with no upper bound. A tiny config value
(e.g. size=15000 or crop_size=50000) makes the processor materialize a huge
(height x width x channels) array at preprocess time for any input image, exhausting
memory and OOM-killing the worker.

Add a shared output-size guard in image_transforms (validate_image_output_size) and
call it in resize, center_crop and the Torchvision backend resize before allocating.
Add a regression test.

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it makes sense. We expect that users/authors to know what they are doing, and putting an arbitrary upper bound doesn't look reasonable to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants