BLIPs clean-up #35560

zucchini-nlp · 2025-01-08T09:39:36Z

What does this PR do?

Continuation of #34502, which I separated out because BLIP wasn't ready to be merged in v4.48 release. The main reason is Blip2Retrieval model whose hub configs weren't updated

Main changes:

Clean up the if/else checks and leave only the new logic with expanded inputs
Change the Blip2Retrieval model class to accept expanded inputs (open to feedback). Currently it expects non-expanded inputs to do ITM so we just crop the inputs removing extra image palceholders
Opened PRs on the hub for the two checkpoints for Blip2Retrieval, will merge right after this PR
The tokenizer's "max_length" now includes the image tokens which is breaking but is in line with other VLMs. I would like to keep the change even if breaking to be consistent, or should we wait for v5.0?

HuggingFaceDocBuilderDev · 2025-01-08T10:06:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

down to have this still, but up to you it's been a while sorry for my late review

zucchini-nlp · 2025-07-15T12:35:50Z

+1, should have this so we can remove deprecated code. I will resolve conflicts and see if anything else needs to be done

zucchini-nlp · 2025-07-17T16:02:25Z

run-slow: blip_2, instructblip, instructblipvideo

github-actions · 2025-07-17T16:03:58Z

This comment contains run-slow, running the specified jobs:

models: ['models/blip_2', 'models/instructblip', 'models/instructblipvideo']
quantizations: [] ...

zucchini-nlp · 2025-07-18T08:20:42Z

utils/tests_fetcher.py

+        # Add imports via `define_import_structure` after the #35167 as we remove explicit import in `__init__.py`
+        from transformers.utils.import_utils import define_import_structure
+
+        reversed_structure = {}
+        new_imported_modules_from_import_structure = define_import_structure("src/transformers/__init__.py")
+        for mapping in new_imported_modules_from_import_structure.values():


I don't know why this fails only after few months after __init__ refactor, prob no-one updated tiny_model_summary.json after that

Without it the script fails to fetch tests that are modified in tiny_model_summary.json

cc @ydshieh !

yeah, the tiny mdoel creation script is failing (on github) and it has been sometime no one (i.e. me) updating the relevant stuff in the codebase.

Will try to work on that

zucchini-nlp · 2025-07-18T08:20:57Z

run-slow: blip_2, instructblip, instructblipvideo

github-actions · 2025-07-18T08:22:17Z

This comment contains run-slow, running the specified jobs:

models: ['models/blip_2', 'models/instructblip', 'models/instructblipvideo']
quantizations: [] ...

zucchini-nlp · 2025-07-21T09:09:36Z

run-slow: blip_2, instructblip, instructblipvideo

github-actions · 2025-07-21T09:10:55Z

This comment contains run-slow, running the specified jobs:

models: ['models/blip_2', 'models/instructblip', 'models/instructblipvideo']
quantizations: [] ...

ArthurZucker · 2025-07-21T12:33:21Z

src/transformers/models/blip_2/modeling_blip_2.py

+        if input_ids is None:
+            special_image_mask = inputs_embeds == self.get_input_embeddings()(
+                torch.tensor(self.config.image_token_id, dtype=torch.long, device=inputs_embeds.device)
+            )
+            special_image_mask = special_image_mask.all(-1)
+        else:
+            special_image_mask = input_ids == self.config.image_token_id
+
+        special_image_mask = special_image_mask.unsqueeze(-1).expand_as(inputs_embeds).to(language_model_inputs.device)
+        language_model_inputs = language_model_inputs.to(inputs_embeds.device, inputs_embeds.dtype)
+        inputs_embeds = inputs_embeds.to(language_model_inputs.device).masked_scatter(
+            special_image_mask, language_model_inputs
+        )


should we have a small function for this?

yeah, we can isolate the special_image_mask obtaining step in a tiny finction, it is a recurring pattern in all VLMs after the last big PR I merged for supporting inputs embeds

Taking it as a note for subsequent PR

ArthurZucker · 2025-07-21T12:33:51Z

utils/tests_fetcher.py

+        # Add imports via `define_import_structure` after the #35167 as we remove explicit import in `__init__.py`
+        from transformers.utils.import_utils import define_import_structure
+
+        reversed_structure = {}
+        new_imported_modules_from_import_structure = define_import_structure("src/transformers/__init__.py")
+        for mapping in new_imported_modules_from_import_structure.values():


cc @ydshieh !

github-actions · 2025-07-21T18:00:52Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: blip_2, instructblip, instructblipvideo

ArthurZucker

LGTM otherwise, but happy if we isolate this in a function along with this PR!

zucchini-nlp · 2025-07-29T08:03:03Z

Let's merge it then, I will make a PR to isolate get_image_mask in a tiny separate fn

ydshieh · 2025-07-29T09:18:44Z

tests/utils/tiny_model_summary.json

@@ -626,7 +626,7 @@
        "model_classes": [
            "Blip2ForConditionalGeneration"
        ],
-        "sha": "35e1ef43da3554af62eb29a7b3dbbef3f3bef48e"
+        "sha": "d0de11fd1f8ca481231c07ee0934924be96cb281"


blips clean up

52aebb1

zucchini-nlp added 5 commits January 20, 2025 13:45

Merge remote-tracking branch 'upstream/main' into blips-cleanup

c270bad

update processor

ea928ff

readability

2db4853

fix processor length

ad7f6cd

fix copies

b82a52e

zucchini-nlp requested a review from ArthurZucker January 20, 2025 15:46

tmp

fd28cab

zucchini-nlp mentioned this pull request Apr 24, 2025

Track progress for VLMs refactoring #33374

Closed

16 tasks

zucchini-nlp mentioned this pull request Jun 23, 2025

Delete deprecated stuff #38838

Merged

ArthurZucker reviewed Jul 15, 2025

View reviewed changes

zucchini-nlp added 3 commits July 15, 2025 16:50

merge main

6c90ddf

update and fix copies

d674220

Merge branch 'main' into blips-cleanup

b4983e9

zucchini-nlp added 4 commits July 18, 2025 09:43

Merge branch 'main' into blips-cleanup

b6e2494

why keep these, delete?

49f9f3b

fix test fetcher

51ad871

irrelevant comment

fe73d12

zucchini-nlp commented Jul 18, 2025

View reviewed changes

zucchini-nlp added 2 commits July 21, 2025 11:03

fix tests

3cb7f6f

Merge remote-tracking branch 'upstream/main' into blips-cleanup

a965ecf

fix tests

068fe69

ArthurZucker reviewed Jul 21, 2025

View reviewed changes

fix copies

43df5ed

ArthurZucker approved these changes Jul 28, 2025

View reviewed changes

zucchini-nlp merged commit 7579479 into huggingface:main Jul 29, 2025
20 checks passed

ydshieh reviewed Jul 29, 2025

View reviewed changes

BLIPs clean-up #35560

BLIPs clean-up #35560

Uh oh!

Conversation

zucchini-nlp commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 8, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jul 15, 2025

Uh oh!

zucchini-nlp commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

zucchini-nlp Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jul 18, 2025

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

zucchini-nlp commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

ArthurZucker Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jul 29, 2025

Uh oh!

Uh oh!

ydshieh Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Jan 8, 2025 •

edited

Loading