[tests] update `test_past_key_values_format` and delete overwrites #40701

gante · 2025-09-04T18:05:35Z

What does this PR do?

(Carved from #40553, which is becoming messy)

This PR:

Updates test_past_key_values_format to support things like GQA or skipped kv cache layers. As a result, we can remove some overwrites/skips 💛
In get_text_config, fixes a bug -- in get_text_config + legacy models, we were not respecting the attribute_map (i.e. the remapping of config attributes, which we need to be careful with), with this PR we do. This bugfix allow us to remove some of the test skips.

gante · 2025-09-04T18:06:30Z

tests/models/speecht5/test_modeling_speecht5.py

@@ -728,10 +727,6 @@ def test_training_gradient_checkpointing_use_reentrant(self):
    def test_training_gradient_checkpointing_use_reentrant_false(self):
        pass

-    @is_flaky(max_attempts=5, description="Flaky for some input configurations.")


(double-checked with flake-finder -- it is no longer flaky)

github-actions · 2025-09-04T18:06:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: dia, gemma3n, got_ocr2, speecht5, t5gemma

HuggingFaceDocBuilderDev · 2025-09-04T18:15:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Great clean-up, just had question to clarify about decoder text configs. TBH didn't know we were using it for flat structured configs

zucchini-nlp · 2025-09-05T09:15:45Z

src/transformers/configuration_utils.py

+                    # Does the class map the new key into a different attribute name at read time? if so, let's write
+                    # into that attribute instead
+                    if new_key in config_to_return.attribute_map:
+                        new_key = config_to_return.attribute_map[new_key]
+


not sure I got this. So if we map the new key back to attribute map, in models like BART we will do num_attention_head -> encoder_attention_heads. This doesn't look quite right if we asked for a decoder config

It does not like right indeed 😢 But attribute_map is a class-level attribute, so we can't update it for the new configuration either (i.e. for the config instance returned by get_text_config).

Note that these encoder/decoder attributes in attribute_map are from old models, and that these inconsistencies only show up if they decide to print internal variables 👀

This means we are limited to two options, to maintain BC:

[This PR] We use the same mapping all over the code (e.g.config.get_text_config(decoder=True).num_attention_head to get the number of attention heads in the decoder), but accept that some old configs will have odd representation because of their attribute map;

[main] Have several if/else scattered across our codebase, like

num_decoder_layers = ( getattr(config, "decoder_layers", None) # flat configs case 1 or getattr(config, "num_decoder_layers", None) # flat configs case 2 or decoder_config.num_hidden_layers # modern default for decoders )

(if we double-down in direction 2, we need to add more if/else cases, our current logic is not robust in all tests).

Option 1. seems much more reliable in the long run, and also nudges everyone into using the same names everywhere (as opposed to relying on attribute maps) 🤗

Alternatively, we may be able to update the attribute_map logic to read/write into the target variable, as opposed to mapping the read/writes 👀

Example:
If we have the {"a": "b"} mapping, atm all reads to config.a actually read config.b without checking if a exists in config. Same for writes.

We could instade make config.a reads read config.a first and, if it doesn't exist, try to read config.b. All writes would write into config.a.

WDYT?

We could instade make config.a reads read config.a first and, if it doesn't exist, try to read config.b. All writes would write into config.a.

This sounds interesting, and slightly breaking because we will end up with two keys for the same concept. It might raise questions such as which value is correct when inspecting visually or serializing configs. For ex: we might have both: image_token_id/image_token_index in some VLMs

Coming back to "Option 1", I see we always check for attribute mapping now. I was expecting that get_text_config() will return a different config only if config structure is nested tbh. Otherwise the whole config is a text config and has no other modalities

In this case I think the current approach is best we can do, because it helps to reduce LOC and is not much breaking. We can ignore the weird naming as noone would serialize/print the text config, I hope. Let's either keep it as is and I also have another option below. Feel free to ignore if it doesn't work

I looked though attribute maps in repo, and it always maps to encoder if encoder-decoder is used. We could deprecate this pattern gradually from mapping and nudge users to explicitly get with config.encoder_attention_heads. We will need to use consistent naming in encoder-decoder models and promote it for future model. Though this option will take a long time to deprecate, maybe even till v5 🙃

@zucchini-nlp

What I'm reading is "let's go with this PR, and try to nudge users away from attribute_map". Is this correct? :)

yeap, the second one is more longer term to make our lives better

Cool!

(Approval please then 💛 )

zucchini-nlp · 2025-09-05T09:16:18Z

tests/generation/test_utils.py

-            text_config = config.get_text_config()
-            num_decoder_layers = (
-                getattr(text_config, "decoder_layers", None)
-                or getattr(text_config, "num_decoder_layers", None)
-                or text_config.num_hidden_layers
-            )
-
+            num_decoder_layers = decoder_config.num_hidden_layers


love this pattern

See my comment above 😅

zucchini-nlp

oops, yep, sorry

gante added 2 commits September 4, 2025 17:13

tmp

b1976e4

rm some overwrites

74b970b

gante requested review from zucchini-nlp and Cyrilvallez September 4, 2025 18:05

gante commented Sep 4, 2025

View reviewed changes

zucchini-nlp reviewed Sep 5, 2025

View reviewed changes

zucchini-nlp approved these changes Sep 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tests] update `test_past_key_values_format` and delete overwrites #40701

[tests] update `test_past_key_values_format` and delete overwrites #40701

Uh oh!

gante commented Sep 4, 2025 •

edited

Loading

Uh oh!

gante Sep 4, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 4, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp Sep 5, 2025

Uh oh!

gante Sep 5, 2025 •

edited

Loading

Uh oh!

gante Sep 5, 2025 •

edited

Loading

Uh oh!

zucchini-nlp Sep 5, 2025

Uh oh!

gante Sep 5, 2025

Uh oh!

zucchini-nlp Sep 5, 2025

Uh oh!

gante Sep 5, 2025

Uh oh!

zucchini-nlp Sep 5, 2025

Uh oh!

gante Sep 5, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

[tests] update test_past_key_values_format and delete overwrites #40701

Are you sure you want to change the base?

[tests] update test_past_key_values_format and delete overwrites #40701

Uh oh!

Conversation

gante commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 4, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[tests] update `test_past_key_values_format` and delete overwrites #40701

[tests] update `test_past_key_values_format` and delete overwrites #40701

gante commented Sep 4, 2025 •

edited

Loading

gante Sep 5, 2025 •

edited

Loading

gante Sep 5, 2025 •

edited

Loading