Skip to content

Conversation

kareemshaik80
Copy link

  • Support for single sink logit in flash attention Decode
  • Add Sink to Softmax
  • Cmd line flag added to enable attention sink

 - Support for single sink logit in flash attention Decode
 - Add Sink to Softmax
 - Cmd line flag added to enable attention sink

Signed-off-by: kareem <[email protected]>
@kareemshaik80 kareemshaik80 marked this pull request as draft September 25, 2025 07:38
@kareemshaik80 kareemshaik80 marked this pull request as ready for review September 25, 2025 07:39
Copy link

@yuankuns yuankuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need paper/code reference to ensure this PR is what intended to do

Copy link

@yuankuns yuankuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not changed

@kareemshaik80
Copy link
Author

Also need paper/code reference to ensure this PR is what intended to do

you can refer this: https://arxiv.org/pdf/2309.17453

eager code: https://github.com/huggingface/transformers/blob/caa14e7dabb086f167c14b7eecadc2ba9db25eb6/src/transformers/models/gpt_oss/modeling_gpt_oss.py#L258

@kareemshaik80 kareemshaik80 requested a review from yuankuns October 7, 2025 04:01
Copy link
Author

@kareemshaik80 kareemshaik80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move test under unit tests.

@Antonyvance
Copy link

@kareemshaik80 I believe this implementation need to change based on this PR 547

@Antonyvance Antonyvance added the redesign required Implementation require a redesign label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redesign required Implementation require a redesign

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants