Fix Qwen2Audio flash attention mask format for generation #41843

Abdennacer-Badaoui · 2025-10-24T08:31:07Z

What does this PR fix?

This PR fixes the test_eager_matches_fa2_generate test failure for Qwen2Audio by using the create_bidirectional_mask utility function to properly handle attention masks across different attention implementations.

The Qwen2Audio model was manually creating a 4D attention mask with -inf values for the audio encoder, regardless of the attention implementation being used. This caused issues with Flash Attention 2/3, which requires a 2D boolean mask (shape (batch_size, seq_len)) with 1 for valid tokens and 0 for padding.

HuggingFaceDocBuilderDev · 2025-10-24T08:40:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Abdennacer-Badaoui · 2025-10-24T08:42:48Z

@remi-or ready for review 🙂

vasqu

We should move away from this way of creating masks. It will make our lives are harder to maintain

src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

vasqu · 2025-10-24T10:22:47Z

run-slow: qwen2_audio

github-actions · 2025-10-24T10:24:09Z

This comment contains run-slow, running the specified jobs:

models: ['models/qwen2_audio']
quantizations: [] ...

vasqu

The nccl error is unrelated, known to fail atm

LGTM

vasqu · 2025-10-24T10:36:41Z

Can you push an empty commit? Cant merge with red CI

github-actions · 2025-10-24T11:42:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_audio

vasqu · 2025-10-24T11:46:37Z

src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

+                dummy_embeds = torch.zeros(
+                    (batch_size, max_seq_len, 1),
+                    dtype=self.audio_tower.conv1.weight.dtype,
+                    device=self.audio_tower.conv1.weight.device,


Ah sorry maybe one last nit: can we change the device/dtype here? Inputs_embeds should suffice?

vasqu · 2025-10-24T12:46:00Z

Thx a lot 🤗

Fix Qwen2Audio flash attention mask format for generation

4b1d2ee

vasqu reviewed Oct 24, 2025

View reviewed changes

src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Outdated Show resolved Hide resolved

src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Outdated Show resolved Hide resolved

use create_bidirectional_mask instead

d276d91

vasqu reviewed Oct 24, 2025

View reviewed changes

src/transformers/models/qwen2_audio/modeling_qwen2_audio.py Outdated Show resolved Hide resolved

Abdennacer-Badaoui and others added 3 commits October 24, 2025 09:43

fix

d75965e

fix

4688f7e

Merge branch 'main' into fix/qwen2-audio-flash-attention-mask

6238985

vasqu approved these changes Oct 24, 2025

View reviewed changes

empty

fda9b53

vasqu reviewed Oct 24, 2025

View reviewed changes

Abdennacer-Badaoui and others added 2 commits October 24, 2025 12:08

fix

7494725

Merge branch 'main' into fix/qwen2-audio-flash-attention-mask

1f7d31f

vasqu merged commit 4faf675 into huggingface:main Oct 24, 2025
17 checks passed

Uh oh!

Fix Qwen2Audio flash attention mask format for generation #41843

Fix Qwen2Audio flash attention mask format for generation #41843

Uh oh!

Conversation

Abdennacer-Badaoui commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR fix?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 24, 2025

Uh oh!

Abdennacer-Badaoui commented Oct 24, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

vasqu Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Abdennacer-Badaoui commented Oct 24, 2025 •

edited

Loading