Delete deprecated stuff #38838

zucchini-nlp · 2025-06-16T06:58:41Z

What does this PR do?

As per title. Removes

deprecated legacy cache
deprecation we has for new processor API
**rope_kwargs from the RoPE API
_seen_tokens in cache classes

First review @gante as most modifications are around cache/generation

HuggingFaceDocBuilderDev · 2025-06-16T07:11:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

Yay cleanups! 🧹 🧹

I left a few minor comments to address. I also have very low confidence on the processing side of changes, so another reviewer for those parts would be great 🤗

src/transformers/cache_utils.py

gante · 2025-06-23T10:35:33Z

src/transformers/modeling_rope_utils.py

@@ -93,7 +93,6 @@ def _compute_default_rope_parameters(
    config: Optional[PretrainedConfig] = None,
    device: Optional["torch.device"] = None,
    seq_len: Optional[int] = None,
-    **rope_kwargs,


(I'm not 100% sure we won't get custom-model-on-the-hub-related BC issues.

On one hand, _compute_default_rope_parameters is a post-refactor function, so if a user started their model from one of our models, they wouldn't have used rope_kwargs. On the other hand, we didn't add an explicit deprecation cycle, my bad 😢 It's a very low likelihood, so I'm happy with the deletion, I don't think adding a deprecation cycle would be worth the extra work)

src/transformers/modeling_rope_utils.py

src/transformers/models/bloom/modeling_bloom.py

gante · 2025-06-23T11:18:08Z

src/transformers/models/whisper/modeling_whisper.py

@@ -315,7 +315,8 @@ def forward(
        query_states = query_states.view(*q_input_shape)
        query_states = query_states.transpose(1, 2).contiguous()

-        if past_key_value is not None:
+        # Check is encoder-decoder model is being used. Otherwise we'll get `DynamicCache`


I like the diff here, it simplified things, and I see parts of it are already present in bart.

Two notes:

The type hints/docstrings need updates. e.g. On L838 we say past_key_values can only be an EncoderDecoderCache (we can accept any Cache) or a tuple(tuple(torch.FloatTensor) (no longer supported). This comment possibly applies on all models on this diff, and even beyond this diff 👀 Given the wide extent of changes, happy to leave it to a follow-up PR.

@vasqu has been making changes to whisper recently, so we should review this file as well :D

gante · 2025-06-23T11:58:58Z

src/transformers/models/longt5/modeling_longt5.py

-                past_key_values = EncoderDecoderCache.from_legacy_cache(past_key_values)
-            elif past_key_values is None:
-                past_key_values = EncoderDecoderCache(DynamicCache(), DynamicCache())
+        if self.is_decoder and use_cache and past_key_values is None:


I think all encoder-decoder models will have to copy the pattern from whisper (if it's a decoder-only model, instantiate a DynamicCache) to preserve BC -- we often allow configuring encoder-decoder models as decoder-only, when config.is_encoder_decoder=False

Co-authored-by: Joao Gante <[email protected]>

zucchini-nlp · 2025-06-23T14:34:26Z

src/transformers/models/instructblipvideo/modular_instructblipvideo.py

@@ -583,7 +583,7 @@ def generate(
            logger.warning_once(
                "Expanding inputs for video tokens in InstructBLIPVideo should be done in processing. "
                "Please follow instruction here (https://gist.github.com/zucchini-nlp/65f22892b054dc0d68228af56fbeaac2) to update your InstructBLIPVideo model. "
-                "Using processors without these attributes in the config is deprecated and will throw an error in v4.47."
+                "Using processors without these attributes in the config is deprecated and will throw an error in v4.54."


depends on #35560, which became stale. For core maintainer's attention, awaits review so we can remove the warnings :)

gante

LGTM 🧹 🧹 🧹

julien-c · 2025-07-07T07:06:31Z

nice, love those PRs! 🎉

zucchini-nlp · 2025-07-07T07:55:57Z

run-slow: align, aria, aya_vision, bart, bigbird_pegasus, blenderbot, blenderbot_small, blip_2, bloom, chameleon, codegen, colpali, colqwen2, csm, dbrx, falcon

github-actions · 2025-07-07T07:57:19Z

This comment contains run-slow, running the specified jobs:

models: ['models/align', 'models/aria', 'models/aya_vision', 'models/bart', 'models/bigbird_pegasus', 'models/blenderbot', 'models/blenderbot_small', 'models/blip_2', 'models/bloom', 'models/chameleon', 'models/codegen', 'models/colpali', 'models/colqwen2', 'models/csm', 'models/dbrx', 'models/falcon']
quantizations: [] ...

zucchini-nlp · 2025-07-07T09:31:51Z

Failing slow tests look to be same as the failing test in main branch. @Cyrilvallez can you take one last look and then I'll merge?

Cyrilvallez

All right, this is god's work 🤗❤️
My main comment is for old models that use stuff like next_cache = output[-1] on each iteration of the layers, can we remove it everywhere? We force cache classes anyway now, so it does not make any sense. We should even stop returning the cache in the Attention and Layer, to simplify much further (it's a bit breaking, but aligned with what we did already!)

src/transformers/cache_utils.py

Cyrilvallez · 2025-07-07T09:43:04Z

src/transformers/integrations/executorch.py

        self.static_cache = StaticCache(
            config=self.config,
            max_batch_size=batch_size,
            max_cache_len=max_static_cache_length,
            device="cpu",
            dtype=torch.float32,
        )
+        self.cache = EncoderDecoderCache(self.static_cache, DynamicCache())


This export recipe only uses the decoder part so it should not need this change no?

The model is encoder-decoder and needs cache for both. Prev, we would wrap static cache in EncoderDecoderCache in model's forward, but from this PR we don't do any legacy/hacks. Instead we expect users to pass the correct new cache class

Export doesn't really care about encoder cache indeed, it's needed for the generation code to be working

So the decoder only part of the model also expects a EncoderDecoderCache?

Yes, because it does cross attention and still would be looking for cache to store encoder_hidden_states. Most of yhese models actually have no option to be run as "CausalLM" type with no cross attention as per code (e.g. T5 had failing tests)

src/transformers/models/bloom/modeling_bloom.py

zucchini-nlp · 2025-07-08T09:26:30Z

run-slow: align, aria, aya_vision, bart, bigbird_pegasus, blenderbot, blenderbot_small, blip_2, bloom, chameleon, codegen, colpali, colqwen2, csm, dbrx, falcon

github-actions · 2025-07-08T09:27:51Z

This comment contains run-slow, running the specified jobs:

models: ['models/align', 'models/aria', 'models/aya_vision', 'models/bart', 'models/bigbird_pegasus', 'models/blenderbot', 'models/blenderbot_small', 'models/blip_2', 'models/bloom', 'models/chameleon', 'models/codegen', 'models/colpali', 'models/colqwen2', 'models/csm', 'models/dbrx', 'models/falcon']
quantizations: [] ...

zucchini-nlp · 2025-07-10T05:06:42Z

Merging, looks like there are no more questions left. It will unblock mt for another clean-up PR

github-actions · 2025-07-10T05:07:14Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: align, aria, aya_vision, bart, bigbird_pegasus, blenderbot, blenderbot_small, blip_2, bloom, chameleon, codegen, colpali, colqwen2, csm, dbrx, falcon

* delete deprecated stuff * fix copies * remove unused tests * fix modernbert and fuyu * Update src/transformers/cache_utils.py Co-authored-by: Joao Gante <[email protected]> * bye bye `seen_tokens` * address comments * update typings * ecnoder decoder models follow same pattern as whisper * fix copies * why is it set to False? * fix switch transformers * fix encoder decoder models shared weight * fix copies and RAG * remove `next_cache` * fix gptj/git * fix copies * fix copies * style... * another forgotten docsrting --------- Co-authored-by: Joao Gante <[email protected]>

delete deprecated stuff

60b4cbe

zucchini-nlp requested a review from gante June 16, 2025 06:58

zucchini-nlp added 4 commits June 16, 2025 10:06

fix copies

98184f9

remove unused tests

325cde8

fix modernbert and fuyu

26a9c28

Merge branch 'main' into remove-deprecations-4.52

68a501e

gante reviewed Jun 23, 2025

View reviewed changes

zucchini-nlp and others added 2 commits June 23, 2025 16:23

Update src/transformers/cache_utils.py

2a53654

Co-authored-by: Joao Gante <[email protected]>

bye bye seen_tokens

1f81714

zucchini-nlp commented Jun 23, 2025

View reviewed changes

zucchini-nlp added 8 commits June 23, 2025 16:47

address comments

ca78f07

update typings

4c9bd33

ecnoder decoder models follow same pattern as whisper

fcbd79e

merge main

00dcc6d

fix copies

c8b7099

why is it set to False?

86d470d

merge main

f19d166

fix switch transformers

ab7fac4

gante approved these changes Jul 2, 2025

View reviewed changes

fix encoder decoder models shared weight

d9ee03f

zucchini-nlp added 2 commits July 7, 2025 09:54

fix copies and RAG

f06327f

Merge branch 'main' into remove-deprecations-4.52

5080a86

zucchini-nlp requested a review from Cyrilvallez July 7, 2025 09:32

Cyrilvallez reviewed Jul 7, 2025

View reviewed changes

remove next_cache

31c5937

zucchini-nlp added 2 commits July 8, 2025 11:21

fix gptj/git

cb8be3c

merge main

7247eed

zucchini-nlp added 6 commits July 8, 2025 11:44

fix copies

48fd132

fix copies

37850e4

style...

44e7125

another forgotten docsrting

d1f9915

Merge branch 'main' into remove-deprecations-4.52

c769d41

Merge branch 'main' into remove-deprecations-4.52

486b21d

zucchini-nlp enabled auto-merge (squash) July 10, 2025 05:06

zucchini-nlp merged commit bc161d5 into huggingface:main Jul 10, 2025
25 checks passed

zucchini-nlp mentioned this pull request Jul 14, 2025

[RoPE] allow models to configure local RoPE #39397

Closed

Delete deprecated stuff #38838

Delete deprecated stuff #38838

Uh oh!

Conversation

zucchini-nlp commented Jun 16, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gante Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gante Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

gante Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

julien-c commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Cyrilvallez Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

zucchini-nlp commented Jul 10, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp Jun 23, 2025 •

edited

Loading

zucchini-nlp Jul 8, 2025 •

edited

Loading