ROCm mx-fp8 Gemm #2066

petrex · 2025-04-16T23:16:03Z

TLDR: This pull request introduces support for AMD MI355x GPUs with HIPBLASLT kernels in the MX formats prototype. Note that this feature requires ROCm 7.0+ and gfx950

alongside several updates to improve compatibility and functionality for these GPUs. Key changes include updates to configuration options, validation logic, and GEMM kernel handling to integrate HIPBLASLT support.

AMD MI355x GPU Support:

torchao/prototype/mx_formats/config.py:
- Added HIPBLASLT as a new MXGemmKernelChoice and included it in the MXLinearRecipeName for configuration presets. [1] [2]
- Updated _validate_gemm_kernel_choice to include validation logic for HIPBLASLT, ensuring proper block size, data type, and ROCm availability.
torchao/prototype/mx_formats/mx_ops.py:
- Extended mx_mm to support HIPBLASLT for scaled matrix multiplication and real GEMM operations. [1] [2]
- Adjusted error messaging for unsupported kernel choices in FP4 operations.

Documentation Updates:

torchao/prototype/mx_formats/README.md:
- Updated the README to reflect AMD MI355x GPU support, including instructions for using HIPBLASLT kernels and ongoing optimization efforts for AMD hardware. [1] [2] [3]

Minor Code Refinements:

torchao/prototype/mx_formats/mx_ops.py:
- Improved readability in mx_view_op by reformatting conditions for FP6 element packing.

…dation logic. Added MXFP8_HIPBLASLT recipe and adjusted mx_mm function to accommodate new kernel options.

…ASLT kernel choice for mxfp8 gemm. Enhance documentation on end-to-end performance optimization efforts for AMD GPUs.

pytorch-bot · 2025-04-16T23:16:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2066

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit 012f938 with merge base 801af03 ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/integration/test_integration.py::TestAutoQuant::test_autoquant_compile_16_cuda

CANCELLED JOB - The following job was cancelled. Please retry:

Run Regression Tests on ROCm / test-nightly (ROCM Nightly, linux.rocm.gpu.mi300.2, --pre torch --index-url https://download.pyto... / linux-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…py to include HIPBLASLT as a valid kernel choice for MX FP8 operations.

petrex · 2025-06-09T17:32:15Z

related to pytorch/pytorch#151360

Copilot

Pull Request Overview

Adds support for AMD MI355x GPUs by introducing HIPBLASLT kernels into the MX formats prototype, along with necessary config updates, validation logic, and documentation enhancements.

Extend MXGemmKernelChoice and MXLinearRecipeName to include HIPBLASLT
Add validation and dispatch logic for HIPBLASLT in config and mx_ops
Update README to show how to use HIPBLASLT on AMD MI355x hardware

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
config.py	Added `HIPBLASLT` enum, recipe, and validation in `_validate_gemm_kernel_choice` and `from_recipe_name`
mx_ops.py	Extended GEMM dispatch and FP8 assertions to support `HIPBLASLT`
README.md	Documented AMD MI355x support and HIPBLASLT usage

Comments suppressed due to low confidence (3)

torchao/prototype/mx_formats/config.py:35

Comment indicates ROCm 7.0 requirement, but PR description specifies ROCm 6.5+. Consider aligning version requirement in code comments and documentation.

# available only on ROCm with HIPBLASLT support, reuqire gfx950 and ROCm 7.0

torchao/prototype/mx_formats/mx_ops.py:91

[nitpick] New HIPBLASLT code path added to GEMM dispatch; consider adding or updating tests to cover this scenario in _addmm_mx_dispatch.

if gemm_choice in (

torchao/prototype/mx_formats/config.py:71

[nitpick] New HIPBLASLT validation logic in _validate_gemm_kernel_choice should be covered by tests to verify block size, dtype, and HIP availability checks.

elif gemm_kernel_choice == MXGemmKernelChoice.HIPBLASLT:

torchao/prototype/mx_formats/config.py

torchao/prototype/mx_formats/mx_ops.py

… HIPBLASLT are supported kernel choices for MX FP8 operations.

…l choices for MX FP8 operations.

drisspg

This is cool / looks good and scaled_mm is ultimately whats backing this right? can you add a test even if its not in our CI/CD

Co-authored-by: Copilot <[email protected]>

- Introduced `is_ROCm_mx_supported` function to verify ROCm environment compatibility for MX operations. - Added `test_hipblaslt_fp8` to validate FP8 operations using the HIPBLASLT backend, including SQNR verification for output accuracy. - Updated imports in `test_mx_mm.py` to include necessary utilities for the new test.

- Replaced `compute_sqnr` with `compute_error` for improved accuracy in error measurement. - Updated assertion to ensure output accuracy meets the specified threshold.

petrex · 2025-06-09T20:52:00Z

This is cool / looks good and scaled_mm is ultimately whats backing this right? can you add a test even if its not in our CI/CD

Thanks.
Right . scale_mm() --> hipblaslt --> gfx950. I'd deploy gfx950s in CI once they are GA.
Added a test that is not currently run in CI.

- Updated the function to ensure `torch.version.hip` is not None before checking the version, improving robustness against potential NoneType errors.

- Reformatted the return statement to enhance clarity and maintainability of the code.

petrex · 2025-06-30T18:19:49Z

This is cool / looks good and scaled_mm is ultimately whats backing this right? can you add a test even if its not in our CI/CD

thanks @drisspg, test added.

drisspg · 2025-06-30T18:36:15Z

torchao/prototype/mx_formats/config.py

    CUBLAS = "cublas"

+    # available only on ROCm with HIPBLASLT support, require gfx950 and ROCm 7.0
+    HIPBLASLT = "hipblaslt"


We should change this to SCALED_MM

cc @vkuzo

Could you clarify your approach here? I believe scale_mm also has a CUDA path—unless you have a different plan in mind? Happy to align with your direction; just let me know.

Ohh I just mean that if I want to quantize a model to mxfp8 I need to know if I am running on rocm or cuda. And the only place where one needs to know this is here. But in reality the "CUBLAS" enum really means "call into scaled_mm" and that would handle all the dispatch logic.

It feels weird and anti-pattern to core pytorch to have device specific(cuda/ROCM) APIs when we dont need to

yes, IMO for what this PR is trying to do we should rename MXGemmKernelChoice.CUBLAS to MXGemmKernelChoice.SCALED_MM, and in a future PR we should probably delete MXGemmKernelChoice - it was for debugging in early days.

Peter Y. Yeh added 2 commits April 16, 2025 15:59

Enhance MX formats to support HIPBLASLT kernel choice and update vali…

c21d24c

…dation logic. Added MXFP8_HIPBLASLT recipe and adjusted mx_mm function to accommodate new kernel options.

Update README.md to include support for AMD MI355x hardware and HIPBL…

36dd5b7

…ASLT kernel choice for mxfp8 gemm. Enhance documentation on end-to-end performance optimization efforts for AMD GPUs.

pytorch-bot bot added the module: rocm label Apr 16, 2025

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2025

petrex added the mx label Apr 17, 2025

lint

c75df8e

petrex requested a review from vkuzo April 18, 2025 16:59

petrex added topic: new feature Use this tag if this PR adds a new feature ciflow/rocm labels Apr 18, 2025

petrex and others added 7 commits April 23, 2025 17:34

Merge branch 'main' into rocm_mx_gemm

9b7b602

Merge branch 'main' into rocm_mx_gemm

5ee124e

lint

df2c220

Merge branch 'main' into rocm_mx_gemm

8df1d85

Update HIPBLASLT comment in config.py and adjust assertion in mx_ops.…

8ae4021

…py to include HIPBLASLT as a valid kernel choice for MX FP8 operations.

lint

8505860

lint

129a6d6

petrex requested review from Copilot and drisspg June 9, 2025 17:32

Copilot AI reviewed Jun 9, 2025

View reviewed changes

torchao/prototype/mx_formats/config.py Outdated Show resolved Hide resolved

torchao/prototype/mx_formats/mx_ops.py Outdated Show resolved Hide resolved

Peter Y. Yeh added 2 commits June 9, 2025 10:56

Update assertion message in mx_ops.py to clarify that both CUBLAS and…

c807d70

… HIPBLASLT are supported kernel choices for MX FP8 operations.

Refactor assertion in mx_ops.py to improve clarity on supported kerne…

75db95e

…l choices for MX FP8 operations.

drisspg reviewed Jun 9, 2025

View reviewed changes

petrex and others added 4 commits June 9, 2025 12:04

Update torchao/prototype/mx_formats/config.py

3ecc91e

Co-authored-by: Copilot <[email protected]>

add space

f88f1cf

Refactor SQNR calculation in HIPBLASLT FP8 test

5d2b55d

- Replaced `compute_sqnr` with `compute_error` for improved accuracy in error measurement. - Updated assertion to ensure output accuracy meets the specified threshold.

Enhance ROCm MX support check in is_ROCm_mx_supported function

979893a

- Updated the function to ensure `torch.version.hip` is not None before checking the version, improving robustness against potential NoneType errors.

Refactor is_ROCm_mx_supported function for improved readability

012f938

- Reformatted the return statement to enhance clarity and maintainability of the code.

petrex self-assigned this Jun 10, 2025

drisspg reviewed Jun 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ROCm mx-fp8 Gemm #2066

ROCm mx-fp8 Gemm #2066

Uh oh!

petrex commented Apr 16, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading

Uh oh!

petrex commented Jun 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

drisspg left a comment •

edited

Loading

Uh oh!

petrex commented Jun 9, 2025

Uh oh!

petrex commented Jun 30, 2025

Uh oh!

drisspg Jun 30, 2025

Uh oh!

petrex Jun 30, 2025

Uh oh!

drisspg Jun 30, 2025

Uh oh!

vkuzo Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

ROCm mx-fp8 Gemm #2066

Are you sure you want to change the base?

ROCm mx-fp8 Gemm #2066

Uh oh!

Conversation

petrex commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AMD MI355x GPU Support:

Documentation Updates:

Minor Code Refinements:

Uh oh!

pytorch-bot bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2066

❌ 1 New Failure, 1 Cancelled Job

Uh oh!

petrex commented Jun 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

drisspg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petrex commented Jun 9, 2025

Uh oh!

petrex commented Jun 30, 2025

Uh oh!

drisspg Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

petrex Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

petrex commented Apr 16, 2025 •

edited

Loading

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading

drisspg left a comment •

edited

Loading