Skip to content

Conversation

adityachatter
Copy link

@adityachatter adityachatter commented Oct 1, 2025

  • Adds functional FP8 support in Chunk Prefill kernel

Changes are stacked over Chunk Prefill pull request: #498
Only FP8 changes: sunjiweiswift/cutlass-sycl@8048471...adityachatter:achatter/fp8_chunk_prefill

Run test code as:

ninja 06_bmg_chunk_prefill_fp8_hdim128
./examples/06_bmg_flash_attention/06_bmg_chunk_prefill_fp8_hdim128

TODO:

  • Fix FP8 performance bottleneck [Will be covered as a separate pull request PYTORCHDGQ-7276]

Valentine233 and others added 9 commits September 30, 2025 23:40
This change imports `SYCLCompat` to cutlass-sycl repo as `compat`.
Previous dependencies on `syclcompat` are changed to `compat`.
This PR also fix some failures of `SYCLCompat` in oneapi 2025.2.

---------

Co-authored-by: Roland Schulz <[email protected]>
@adityachatter adityachatter force-pushed the achatter/fp8_chunk_prefill branch from ba7aff5 to c174a4c Compare October 13, 2025 03:15
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
@adityachatter adityachatter force-pushed the achatter/fp8_chunk_prefill branch from 97e0902 to 177ec96 Compare October 13, 2025 08:09
@adityachatter adityachatter marked this pull request as ready for review October 13, 2025 08:12
Moved scale factor within Options

Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
@Antonyvance
Copy link

@adityachatter Can you check this PR 547 and redesign accordingly?

@Antonyvance Antonyvance added the redesign required Implementation require a redesign label Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redesign required Implementation require a redesign

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants