Skip to content

Conversation

@ecamartins
Copy link
Collaborator

@ecamartins ecamartins commented Nov 12, 2025

Proposed changes

The original CK Stream-K implementation used a Tile Partitioner that was based on old CK's Stream-K block to C tile map. However, old CK's implementation did not align with the original Stream-K paper. Thus, we implemented a new Tile Partitioner (#3018) and associated Stream-K Kernel (#3064). The new CK Tile Stream-K kernel implementation was placed in the reboot namespace.

Now that all functionality for the new implementation is in place, including examples (#3107), we can now remove the old CK tile Stream-K implementation. Thus, this PR makes the following changes:

  • Removes all uses of the old CK Tile Stream-K implementation.
  • Removes the reboot namespace such that the new implementation is in the ck_tile namespace only.
  • Adds tests for fp8 and bf8 for the new implementation as these were only in place for the old implementation.
  • Updated comment style in the Stream-K kernel file to follow /** style.
  • Removes the old CK Tile Stream-K Tile partitioner.
  • Remove the v2 suffix from the new CK Tile Tile Partitioner derived classes.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

@ecamartins ecamartins self-assigned this Nov 12, 2025
@ecamartins ecamartins force-pushed the emimarti/ck_tile/remove_old_streamk branch 5 times, most recently from 2c8bf72 to 0d93e6c Compare November 18, 2025 18:57
@ecamartins ecamartins marked this pull request as ready for review November 18, 2025 18:59
amd-anclark
amd-anclark previously approved these changes Nov 19, 2025
cgmillette
cgmillette previously approved these changes Nov 19, 2025
Copy link
Collaborator

@cgmillette cgmillette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, lgtm!

Future tests will go into tile engine, so these are good for spot-checking.

The original CK Stream-K implementation was based on old CK's Stream-K
block to C tile map. However, this implementation did not align with the
original Stream-K paper. Thus, we implemented a new tile partitioner and
associated Stream-K kernel, which was placed in the reboot namespace.

Now that the new Stream-K implementation is ready, this change removes
all artifacts of the old implementation. Specifically, the following
changes were made:
- Removes old Stream-K tile partitioner from CK Tile
- Removes the reboot namespace such that the new implementation resides
  in the ck_tile namespace only.
- Adds tests for bf8 and fp8 using the new implementation
- Removes tests for the old implementation
- Remove the v2 suffix from the new CK Tile Tile Partitioner
derived classes.
- Updates Stream-K Kernel ops file to use /** commenting style.
@ecamartins ecamartins force-pushed the emimarti/ck_tile/remove_old_streamk branch from 87b5186 to 3187ee6 Compare November 20, 2025 05:53
@ecamartins ecamartins merged commit 2e4b8a8 into develop Nov 20, 2025
21 checks passed
@ecamartins ecamartins deleted the emimarti/ck_tile/remove_old_streamk branch November 20, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants