GQA #623

sunjiweiswift · 2025-11-11T05:36:51Z

Description

Type

Bug - [ ] Feature - [ ] Performance - [ ] Refactor

Testing

Tests pass - [ ] Xe12 - [ ] Xe20

Performance

Metric	Before	After

References

Fixes #

Checklist

Copyright - [ ] Co-pilot Review - [ ] Deprecated APIs not used

airMeng

Based on my XeTLA experience, Q-head folding improves decoding performance but hurts prefill. Should this optimization be made conditional, similar to reduce_A ?

sunjiweiswift marked this pull request as draft November 11, 2025 05:36

sunjiweiswift closed this Nov 11, 2025

sunjiweiswift force-pushed the fmha_GQA branch from ae53435 to b62b28d Compare November 11, 2025 05:38

sunjiweiswift reopened this Nov 11, 2025

sunjiweiswift force-pushed the fmha_GQA branch 9 times, most recently from 5cb846f to 90a414a Compare November 17, 2025 08:51

airMeng reviewed Nov 17, 2025

View reviewed changes

sunjiweiswift force-pushed the fmha_GQA branch 2 times, most recently from e8576ac to c114d9f Compare November 18, 2025 05:31

have bug with r2s

b5af73f

sunjiweiswift force-pushed the fmha_GQA branch from c114d9f to b5af73f Compare November 18, 2025 05:34

sunjiweiswift added 4 commits November 18, 2025 15:44

modify r2s s2r

5b9a25e

use copy_block_s2r

a1d4169

modify r2s s2r

343b4df

modify r2s s2r

4cf57e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GQA #623

GQA #623

Uh oh!

sunjiweiswift commented Nov 11, 2025

Uh oh!

airMeng left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GQA #623

Are you sure you want to change the base?

GQA #623

Uh oh!

Conversation

sunjiweiswift commented Nov 11, 2025

Description

Type

Testing

Performance

References

Checklist

Uh oh!

airMeng left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants