Skip to content

Conversation

@petercad
Copy link

@petercad petercad commented Nov 10, 2025

This PR builds on #573, adding a CollectiveEpilogue with support for the new block 2D copy atoms.

The existing epilogue implementation was mostly rewritten, as it had many hardcoded assumptions and limitations:

  • Subgroups own a contiguous tile within the workgroup tile
  • Subgroup tiles are laid out n-major within the workgroup tile
  • C/D atoms have the same block size
  • One copy atom of data is processed at a time
  • C/D atoms must bring data in the exact same layout as the accumulator

The new implementation removes all these restrictions.

Its API is also somewhat different, mostly in ways that more closely match the SM90 epilogues:

  • Configurable EpilogueTile template parameter controls the block size for epilogue computation.
  • Fusion callbacks receive workgroup-scope tiling information, not subgroup-scope tiling information (because CuTe's TiledMMA is very flexible -- the subgroup "tile" may not be contiguous).
  • Vectorization for the epilogue compute operations is configurable via the ComputeVectorLen constexpr variable. Currently this is set to operate on one MMA atom's worth of accumulator data at a time, but if we want to make it user-configurable like the NV epilogues (where it's a template parameter for the dispatch policy), that's possible.
  • It receives the TiledMMA as a template parameter rather than an argument to operator().
  • The S2R/R2S copy operation parameters are omitted (a difference vs. SM90) as they are irrelevant to both the old and new epilogue implementation.

The new implementation glues together C/D loads and compute with reorders, so it can support efficient data type and layout conversions outside of the epilogue computation.

@tdeng5 tdeng5 added the release label Nov 11, 2025
@petercad petercad force-pushed the petercad/new_epilogue branch from f6f793e to f43ee5f Compare November 12, 2025 17:08
@petercad petercad force-pushed the petercad/new_epilogue branch from f43ee5f to 9cf2998 Compare November 12, 2025 18:38
@tdeng5 tdeng5 removed the release label Nov 13, 2025
@tdeng5 tdeng5 merged commit 3f2a337 into main Nov 20, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants