Parallel Cross Entropy using online softmax #1456

sanandaraj5597 · 2025-02-04T22:26:24Z

Description

This PR implements a parallel cross entropy function using the online technique to calculate softmax. This feature has multiple aspects:

The vocab dimension can be sharded along the TP axis to perform this loss calculation in a distributed fashion.
Online softmax helps us parallelize the softmax calculation giving us more efficiency.
Calculating gradients in the forward itself, so backward step is a no-op.
Storing the gradients in-place of the input tensor saving memory.
OAI Triton implementation helps us integrate GPU kernel level semantics and torch level communication API's together.

[Thanks to Liger kernel implementation for providing the idea about online softmax and in-place gradient calculation.]

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

timmoon10

Can you add a test in tests/pytorch?

Signed-off-by: Selvaraj Anandaraj <[email protected]>

sanandaraj5597 · 2025-02-19T07:01:40Z

@timmoon10 Added tests.

for more information, see https://pre-commit.ci

transformer_engine/pytorch/triton/cross_entropy.py

tests/pytorch/test_parallel_cross_entropy.py

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…j5597/TransformerEngine into parallel_cross_entropy

for more information, see https://pre-commit.ci

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…j5597/TransformerEngine into parallel_cross_entropy

Signed-off-by: Selvaraj Anandaraj <[email protected]>

ksivaman

@sanandaraj5597 Could you fix the linting errors? You could run it locally using bash qa/L0_pytorch_lint/test.sh

Signed-off-by: Selvaraj Anandaraj <[email protected]>

sanandaraj5597 · 2025-02-19T17:17:29Z

Fixed lint errors.

for more information, see https://pre-commit.ci

setup.py

Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]>

timmoon10

LGTM, pending CI

timmoon10 · 2025-02-19T18:54:42Z

/te-ci pytorch

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…j5597/TransformerEngine into parallel_cross_entropy

timmoon10 · 2025-02-20T05:36:45Z

/te-ci pytorch

ksivaman

LGTM!

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…j5597/TransformerEngine into parallel_cross_entropy

timmoon10 · 2025-02-21T00:37:44Z

/te-ci pytorch

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…j5597/TransformerEngine into parallel_cross_entropy

for more information, see https://pre-commit.ci

timmoon10 · 2025-02-21T03:42:56Z

/te-ci pytorch

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2025-02-21T22:24:46Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2025-02-25T22:24:53Z

It seems that the Triton dependency is messing up other tests. The latest upstream Triton (3.2.0) doesn't support Blackwell, so the NVIDIA PyTorch container is using a custom internal build. I've removed Triton as a formal dependency, but we should put it back once Blackwell support is upstreamed.

timmoon10 · 2025-02-25T22:25:25Z

/te-ci pytorch

Selvaraj Anandaraj and others added 2 commits February 4, 2025 14:20

Added parallel cross entropy loss implementation using online softmax

c6ed8cb

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

bd5b5ef

for more information, see https://pre-commit.ci

timmoon10 reviewed Feb 4, 2025

View reviewed changes

timmoon10 self-requested a review February 4, 2025 22:38

timmoon10 added the enhancement New feature or request label Feb 4, 2025

sanandaraj5597 and others added 3 commits February 7, 2025 18:04

Merge branch 'NVIDIA:main' into parallel_cross_entropy

8d441ca

Added tests

b575bd8

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'main' into parallel_cross_entropy

145c9f6

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5553f90

for more information, see https://pre-commit.ci

ksivaman reviewed Feb 19, 2025

View reviewed changes

transformer_engine/pytorch/triton/cross_entropy.py Show resolved Hide resolved

ksivaman reviewed Feb 19, 2025

View reviewed changes

tests/pytorch/test_parallel_cross_entropy.py Show resolved Hide resolved

Selvaraj Anandaraj and others added 7 commits February 19, 2025 00:09

Added reshape of loss output

74c0762

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'parallel_cross_entropy' of https://github.com/sanandara…

bc42a52

…j5597/TransformerEngine into parallel_cross_entropy

[pre-commit.ci] auto fixes from pre-commit.com hooks

a1d8589

for more information, see https://pre-commit.ci

Added to test list

529eeb6

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'parallel_cross_entropy' of https://github.com/sanandara…

3ad7137

…j5597/TransformerEngine into parallel_cross_entropy

Added Triton dependency

55fc62b

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Added copyright

659b81e

Signed-off-by: Selvaraj Anandaraj <[email protected]>

ksivaman reviewed Feb 19, 2025

View reviewed changes

Fixed lint errors

392227c

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

33058bc

for more information, see https://pre-commit.ci

ksivaman reviewed Feb 19, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

sanandaraj5597 and others added 2 commits February 19, 2025 09:19

Update setup.py

2fc8c53

Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'main' into parallel_cross_entropy

2bd33f6

timmoon10 approved these changes Feb 19, 2025

View reviewed changes

Selvaraj Anandaraj added 2 commits February 19, 2025 19:07

Fixed lint and triton failure

1d08847

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'parallel_cross_entropy' of https://github.com/sanandara…

f87a894

…j5597/TransformerEngine into parallel_cross_entropy

Merge branch 'main' into parallel_cross_entropy

d5bb347

ksivaman approved these changes Feb 20, 2025

View reviewed changes

Selvaraj Anandaraj and others added 3 commits February 20, 2025 08:55

Removed flattening for scalars

f2be295

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'parallel_cross_entropy' of https://github.com/sanandara…

d7a61aa

…j5597/TransformerEngine into parallel_cross_entropy

Merge branch 'main' into parallel_cross_entropy

e9b84cc

Selvaraj Anandaraj and others added 3 commits February 20, 2025 19:40

Skip tests on Blackwell due to TE CI caveat

3ca16b0

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'parallel_cross_entropy' of https://github.com/sanandara…

fa819ed

…j5597/TransformerEngine into parallel_cross_entropy

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa0bbbd

for more information, see https://pre-commit.ci

Selvaraj Anandaraj and others added 4 commits February 20, 2025 20:37

Added reason arg

b480330

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Fixed conflicts

58c921c

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'main' into parallel_cross_entropy

dfbf9ac

[pre-commit.ci] auto fixes from pre-commit.com hooks

1425257

for more information, see https://pre-commit.ci

timmoon10 added 2 commits February 25, 2025 22:13

Merge branch 'main' into parallel_cross_entropy

66274f4

Signed-off-by: Tim Moon <[email protected]>

Do not register Triton dependency with setuptools

937a638

Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into parallel_cross_entropy

321b5e8

timmoon10 merged commit 8ca2caf into NVIDIA:main Feb 26, 2025
1 of 2 checks passed

sanandaraj5597 deleted the parallel_cross_entropy branch February 26, 2025 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Cross Entropy using online softmax #1456

Parallel Cross Entropy using online softmax #1456

sanandaraj5597 commented Feb 4, 2025

timmoon10 left a comment

sanandaraj5597 commented Feb 19, 2025

ksivaman left a comment

sanandaraj5597 commented Feb 19, 2025

timmoon10 left a comment

timmoon10 commented Feb 19, 2025

timmoon10 commented Feb 20, 2025

ksivaman left a comment

timmoon10 commented Feb 21, 2025

timmoon10 commented Feb 21, 2025

timmoon10 commented Feb 21, 2025

timmoon10 commented Feb 25, 2025

timmoon10 commented Feb 25, 2025

Parallel Cross Entropy using online softmax #1456

Parallel Cross Entropy using online softmax #1456

Conversation

sanandaraj5597 commented Feb 4, 2025

Description

timmoon10 left a comment

Choose a reason for hiding this comment

sanandaraj5597 commented Feb 19, 2025

ksivaman left a comment

Choose a reason for hiding this comment

sanandaraj5597 commented Feb 19, 2025

timmoon10 left a comment

Choose a reason for hiding this comment

timmoon10 commented Feb 19, 2025

timmoon10 commented Feb 20, 2025

ksivaman left a comment

Choose a reason for hiding this comment

timmoon10 commented Feb 21, 2025

timmoon10 commented Feb 21, 2025

timmoon10 commented Feb 21, 2025

timmoon10 commented Feb 25, 2025

timmoon10 commented Feb 25, 2025