[common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions #1437

Oleg-Goncharov · 2025-01-29T17:38:15Z

Description

Lifted restrictions on input tensor dimensions in MXFP8 fused kernels, allowing arbitrary number of rows (or the product of all tensor dimensions except the last of high-dimensional tensors).
The number of columns (or the dimensionality of the last dimension of high-dimensional tensors) must satisfy the memory alignment requirement for Tensor Memory Accelerator (TMA) to be a multiple of 16 bytes.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Removed the isFullTile requirement in the mxfp8_quantize function
Added 16-byte alignment check when creating a TMA descriptor
MXFP8 Test Suite: Added the alignment/padding requirement for dimensions of the tensors with scaling factors to be multiples of [128,4] for row-wise and [4,128] for column-wise scaling.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Oleg Goncharov <[email protected]>

for more information, see https://pre-commit.ci

timmoon10

Mostly LGTM. It looks like this is needed before we can merge #1435.

tests/cpp/operator/test_cast_mxfp8.cu

transformer_engine/common/common.cu

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Oleg Goncharov <[email protected]>

…sed by to string conversion) Signed-off-by: Oleg Goncharov <[email protected]>

for more information, see https://pre-commit.ci

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Oleg Goncharov <[email protected]>

timmoon10 · 2025-01-31T00:45:56Z

Pipeline 23264050

Generalized MXFP8 fused kernels w.r.t. input tensor dimensions

b01027c

Signed-off-by: Oleg Goncharov <[email protected]>

Oleg-Goncharov added enhancement New feature or request 2.0.0 labels Jan 29, 2025

Oleg-Goncharov requested a review from ptrendx January 29, 2025 17:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

2760f0a

for more information, see https://pre-commit.ci

timmoon10 approved these changes Jan 29, 2025

View reviewed changes

tests/cpp/operator/test_cast_mxfp8.cu Outdated Show resolved Hide resolved

transformer_engine/common/common.cu Outdated Show resolved Hide resolved

Oleg-Goncharov and others added 8 commits January 29, 2025 22:31

Update transformer_engine/common/common.cu

03676a0

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e8873d

for more information, see https://pre-commit.ci

Removed unnecessary test scenarios

6fed206

Signed-off-by: Oleg Goncharov <[email protected]>

Reverted the previous commit as it generated a compilation error (cau…

cb68911

…sed by to string conversion) Signed-off-by: Oleg Goncharov <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f6f982

for more information, see https://pre-commit.ci

Update transformer_engine/common/common.cu

491637a

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1868c1a

for more information, see https://pre-commit.ci

Update test_cast_mxfp8.cu

0d1d45f

Signed-off-by: Oleg Goncharov <[email protected]>

ptrendx approved these changes Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions #1437

[common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions #1437

Oleg-Goncharov commented Jan 29, 2025

timmoon10 left a comment

timmoon10 commented Jan 31, 2025 •

edited

Loading

[common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions #1437

Are you sure you want to change the base?

[common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions #1437

Conversation

Oleg-Goncharov commented Jan 29, 2025

Description

Type of change

Changes

Checklist:

timmoon10 left a comment

Choose a reason for hiding this comment

timmoon10 commented Jan 31, 2025 • edited Loading

timmoon10 commented Jan 31, 2025 •

edited

Loading