Commit 3f2a337
authored
Rearchitecture: Xe epilogue (#621)
This PR builds on #573, adding a `CollectiveEpilogue` with support for
the new block 2D copy atoms.
The existing epilogue implementation was mostly rewritten, as it had
many hardcoded assumptions and limitations:
* Subgroups own a contiguous tile within the workgroup tile
* Subgroup tiles are laid out n-major within the workgroup tile
* C/D atoms have the same block size
* One copy atom of data is processed at a time
* C/D atoms must bring data in the exact same layout as the accumulator
The new implementation removes all these restrictions.
Its API is also somewhat different, mostly in ways that more closely
match the SM90 epilogues:
* Configurable EpilogueTile template parameter controls the block size
for epilogue computation.
* Fusion callbacks receive workgroup-scope tiling information, not
subgroup-scope tiling information (because CuTe's TiledMMA is very
flexible -- the subgroup "tile" may not be contiguous).
* Vectorization for the epilogue compute operations is configurable via
the `ComputeVectorLen` constexpr variable. Currently this is set to
operate on one MMA atom's worth of accumulator data at a time, but if we
want to make it user-configurable like the NV epilogues (where it's a
template parameter for the dispatch policy), that's possible.
* It receives the TiledMMA as a template parameter rather than an
argument to `operator()`.
* The S2R/R2S copy operation parameters are omitted (a difference vs.
SM90) as they are irrelevant to both the old and new epilogue
implementation.
The new implementation glues together C/D loads and compute with
reorders, so it can support efficient data type and layout conversions
outside of the epilogue computation.1 parent 92785e4 commit 3f2a337
File tree
7 files changed
+221
-282
lines changed- examples/00_bmg_gemm
- include
- cute
- util
- cutlass
- epilogue
- collective
- gemm/kernel
7 files changed
+221
-282
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
157 | | - | |
| 157 | + | |
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | | - | |
164 | 163 | | |
165 | 164 | | |
166 | 165 | | |
| |||
343 | 342 | | |
344 | 343 | | |
345 | 344 | | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
350 | 349 | | |
351 | 350 | | |
352 | 351 | | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | 352 | | |
358 | 353 | | |
359 | 354 | | |
| |||
373 | 368 | | |
374 | 369 | | |
375 | 370 | | |
| 371 | + | |
376 | 372 | | |
377 | 373 | | |
378 | 374 | | |
| |||
385 | 381 | | |
386 | 382 | | |
387 | 383 | | |
388 | | - | |
| 384 | + | |
389 | 385 | | |
390 | 386 | | |
391 | 387 | | |
392 | 388 | | |
393 | 389 | | |
394 | | - | |
| 390 | + | |
| 391 | + | |
395 | 392 | | |
396 | 393 | | |
397 | 394 | | |
398 | 395 | | |
399 | | - | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
404 | 399 | | |
405 | 400 | | |
406 | 401 | | |
| |||
417 | 412 | | |
418 | 413 | | |
419 | 414 | | |
420 | | - | |
421 | | - | |
422 | | - | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
423 | 418 | | |
424 | 419 | | |
425 | 420 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
109 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
110 | 123 | | |
111 | 124 | | |
112 | 125 | | |
113 | | - | |
| 126 | + | |
114 | 127 | | |
115 | 128 | | |
116 | 129 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
330 | 336 | | |
0 commit comments