Skip to content

Conversation

@Abdennacer-Badaoui
Copy link
Member

@Abdennacer-Badaoui Abdennacer-Badaoui commented Oct 24, 2025

What does this PR fix?

This PR fixes the test_eager_matches_fa2_generate test failure for Qwen2Audio by using the create_bidirectional_mask utility function to properly handle attention masks across different attention implementations.

The Qwen2Audio model was manually creating a 4D attention mask with -inf values for the audio encoder, regardless of the attention implementation being used. This caused issues with Flash Attention 2/3, which requires a 2D boolean mask (shape (batch_size, seq_len)) with 1 for valid tokens and 0 for padding.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Abdennacer-Badaoui
Copy link
Member Author

@remi-or ready for review 🙂

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move away from this way of creating masks. It will make our lives are harder to maintain

@vasqu
Copy link
Contributor

vasqu commented Oct 24, 2025

run-slow: qwen2_audio

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/qwen2_audio']
quantizations: [] ...

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nccl error is unrelated, known to fail atm

LGTM

@vasqu
Copy link
Contributor

vasqu commented Oct 24, 2025

Can you push an empty commit? Cant merge with red CI

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_audio

dummy_embeds = torch.zeros(
(batch_size, max_seq_len, 1),
dtype=self.audio_tower.conv1.weight.dtype,
device=self.audio_tower.conv1.weight.device,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry maybe one last nit: can we change the device/dtype here? Inputs_embeds should suffice?

@vasqu vasqu merged commit 4faf675 into huggingface:main Oct 24, 2025
17 checks passed
@vasqu
Copy link
Contributor

vasqu commented Oct 24, 2025

Thx a lot 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants