Support `store_param_remainders` feature from Apex in TE Fused Adam #1408

sanandaraj5597 · 2025-01-13T23:49:27Z

Description

When the master parameter is in FP32 and the model parameters are in BF16, we can store the trailing 16 remainder bits and reconstruct the master FP32 param from (BF16 model param + the remainder).

This helps us half the master parameter memory usage.

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…ransformerEngine into param_remainder

for more information, see https://pre-commit.ci

MaciejBalaNV · 2025-01-17T13:27:10Z

transformer_engine/pytorch/optimizers/fused_adam.py

@@ -243,13 +256,14 @@ def _apply_scale(self, state_name, unscaled_state, scaled_state, scale):
            unscaled_state.mul_(rscale)
            scaled_state.copy_(unscaled_state)

-    def get_unscaled_state(self, param, state_name):
+    def get_unscaled_state(self, param, state_name, store_param_remainders=False):


The default value of store_param_remainders is False here, but it's True by default in the constructor. I think it's misleading, why not just set it to True here as well?

I don't want to store param remainders for state_name other than master_params, that's why it's defaulted to false.

I'd prefer if this function didn't expose this kwarg since it makes its behavior less obvious. get_unscaled_state implies that it produces an FP32 value that is ready to use, so it would be better if step called a different function to access the BF16 remainder. If we want to keep this overall logic, we should change the function name to something more accurate (although a vague name like get_state_for_adam_kernel is a code smell).

Worked around it. Resolving conversation.

It's better, although we still have the problem that state scaling and BF16 remainders are both using this function in different ways. It's troubling that get_unscaled_state might not get the unscaled state.

I agree your point. But using this function both with/without feature makes it look very efficient. Writing a separate function needs new function usage across step function, checkpointing, etc.

I've also tried to add assert checks inside the function to tighten the understanding/correctness. Hope you are fine with it.

MaciejBalaNV · 2025-01-17T14:36:12Z

I'm getting NaNs when using this feature. You can reproduce it by running test_fused_optimizer tests, after setting store_param_remainders=True in _initialize_state method (otherwise it fails earlier) and by commenting out torch.testing.assert_close(ref_params, master_params) check (this is expected to fail, since we now keep master_params as int16).

Still, with all these changes, the tests fail at torch.testing.assert_close(ref_params, model_params_to_fp32, rtol=1e-2, atol=1e-2, equal_nan=True) with an error message that weights are NaN.

transformer_engine/pytorch/csrc/extensions/multi_tensor/multi_tensor_adam.cu

transformer_engine/pytorch/optimizers/fused_adam.py

timmoon10 · 2025-01-21T22:31:46Z

transformer_engine/pytorch/optimizers/fused_adam.py

@@ -243,13 +256,14 @@ def _apply_scale(self, state_name, unscaled_state, scaled_state, scale):
            unscaled_state.mul_(rscale)
            scaled_state.copy_(unscaled_state)

-    def get_unscaled_state(self, param, state_name):
+    def get_unscaled_state(self, param, state_name, store_param_remainders=False):


I'd prefer if this function didn't expose this kwarg since it makes its behavior less obvious. get_unscaled_state implies that it produces an FP32 value that is ready to use, so it would be better if step called a different function to access the BF16 remainder. If we want to keep this overall logic, we should change the function name to something more accurate (although a vague name like get_state_for_adam_kernel is a code smell).

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…ransformerEngine into param_remainder

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…ransformerEngine into param_remainder

for more information, see https://pre-commit.ci

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

MaciejBalaNV · 2025-01-30T12:58:20Z

Another thing - this code is failing with CUDA memory access errors when we pass both capturable=True and store_param_remainders=True. I think the tests didn't pick this up, because the only test with capturable=True and master_weights=True uses FP16, not BF16, so param_remainders are silently not used. At the very least we should have an assert to make sure capturable=False is passed. Of course making it work with capturable=True would be even better.

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…ransformerEngine into param_remainder

for more information, see https://pre-commit.ci

sanandaraj5597 · 2025-01-30T18:34:56Z

@MaciejBalaNV I added a failure guard with capturable mode. We don't have a plan to use CUDA graphs with optimizers, so not worth having this support.

timmoon10

Making this feature opt-in makes me feel a lot less worried. Users should know what they are doing before enabling this optimization.

I'll change this PR to merge into release_v2.0 and start a CI pipeline.

timmoon10 · 2025-01-30T22:24:02Z

transformer_engine/pytorch/csrc/extensions/multi_tensor/multi_tensor_adam.cu

+  DISPATCH_DOUBLE_FLOAT_HALF_AND_BFLOAT(
+      p_in_type, 0, "adam",


We've already confirmed that p_in_type is BF16, so dispatching for FP16 and FP32 is unnecessary. See #1408 (comment).

Suggested change

DISPATCH_DOUBLE_FLOAT_HALF_AND_BFLOAT(

p_in_type, 0, "adam",

Agreed, but it shouldn't break the code I guess.

timmoon10 · 2025-01-30T22:30:51Z

transformer_engine/pytorch/optimizers/fused_adam.py

@@ -243,13 +256,14 @@ def _apply_scale(self, state_name, unscaled_state, scaled_state, scale):
            unscaled_state.mul_(rscale)
            scaled_state.copy_(unscaled_state)

-    def get_unscaled_state(self, param, state_name):
+    def get_unscaled_state(self, param, state_name, store_param_remainders=False):


It's better, although we still have the problem that state scaling and BF16 remainders are both using this function in different ways. It's troubling that get_unscaled_state might not get the unscaled state.

timmoon10 · 2025-01-30T23:13:06Z

/te-ci pytorch

transformer_engine/pytorch/optimizers/fused_adam.py

Signed-off-by: Selvaraj Anandaraj <[email protected]>

…ransformerEngine into param_remainder

sanandaraj5597 · 2025-01-31T01:08:01Z

/te-ci pytorch

timmoon10 · 2025-01-31T01:45:11Z

#1443 is identical to this PR, but rebased on the release_v2.0 branch.

Selvaraj Anandaraj and others added 6 commits January 13, 2025 14:35

Initial commit

9072c5f

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Fixed compilation errors

7d5d0dc

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'main' into param_remainder

bcaf16d

Fixed syntax errors

979a4c1

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'param_remainder' of https://github.com/sanandaraj5597/T…

75b737c

…ransformerEngine into param_remainder

[pre-commit.ci] auto fixes from pre-commit.com hooks

887432d

for more information, see https://pre-commit.ci

MaciejBalaNV reviewed Jan 17, 2025

View reviewed changes

timmoon10 requested changes Jan 21, 2025

View reviewed changes

Selvaraj Anandaraj and others added 17 commits January 29, 2025 21:28

Fixed NaN issue when initial param value is zero

1f51592

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'param_remainder' of https://github.com/sanandaraj5597/T…

bf1a07b

…ransformerEngine into param_remainder

Removed 64 bit indexing instantiation

d662d78

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Made this feature an opt-in

fa28b75

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Removed arg from unscaled state

0dd4b79

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Fixed compilation error

0a257f3

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

98cf4fa

for more information, see https://pre-commit.ci

Cleaned up errors

334e8cf

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Cleaned up errors

a4e46c9

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0feb6f1

for more information, see https://pre-commit.ci

Added support for checkpointing

07f1479

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'param_remainder' of https://github.com/sanandaraj5597/T…

5a4a91d

…ransformerEngine into param_remainder

[pre-commit.ci] auto fixes from pre-commit.com hooks

251b5ef

for more information, see https://pre-commit.ci

Fixed checkpointing logic

1d917cd

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Added tests

e10f6c3

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Fixed merge conflicts

30499d5

Signed-off-by: Selvaraj Anandaraj <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c202ca

for more information, see https://pre-commit.ci

Selvaraj Anandaraj and others added 3 commits January 30, 2025 10:32

Added assert failure for capturable mode

123ea3f

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'param_remainder' of https://github.com/sanandaraj5597/T…

4d90bcc

…ransformerEngine into param_remainder

[pre-commit.ci] auto fixes from pre-commit.com hooks

ec76525

for more information, see https://pre-commit.ci

timmoon10 approved these changes Jan 30, 2025

View reviewed changes

timmoon10 changed the base branch from main to release_v2.0 January 30, 2025 22:34

timmoon10 changed the base branch from release_v2.0 to main January 30, 2025 22:35

timmoon10 mentioned this pull request Jan 30, 2025

Support store_param_remainders feature from Apex in TE Fused Adam #1443

Open

13 tasks

Merge branch 'main' into param_remainder

7a144ec

timmoon10 reviewed Jan 31, 2025

View reviewed changes

transformer_engine/pytorch/optimizers/fused_adam.py Show resolved Hide resolved

Selvaraj Anandaraj added 2 commits January 30, 2025 17:03

Fixed pylint errors

d08b814

Signed-off-by: Selvaraj Anandaraj <[email protected]>

Merge branch 'param_remainder' of https://github.com/sanandaraj5597/T…

f6cc32b

…ransformerEngine into param_remainder

timmoon10 merged commit e536954 into NVIDIA:main Jan 31, 2025
14 checks passed

sanandaraj5597 deleted the param_remainder branch January 31, 2025 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `store_param_remainders` feature from Apex in TE Fused Adam #1408

Support `store_param_remainders` feature from Apex in TE Fused Adam #1408

sanandaraj5597 commented Jan 13, 2025

MaciejBalaNV Jan 17, 2025

sanandaraj5597 Jan 17, 2025

timmoon10 Jan 21, 2025

sanandaraj5597 Jan 30, 2025

timmoon10 Jan 30, 2025

sanandaraj5597 Jan 30, 2025 •

edited

Loading

MaciejBalaNV commented Jan 17, 2025

timmoon10 Jan 21, 2025

MaciejBalaNV commented Jan 30, 2025 •

edited

Loading

sanandaraj5597 commented Jan 30, 2025

timmoon10 left a comment

timmoon10 Jan 30, 2025

sanandaraj5597 Jan 30, 2025

timmoon10 Jan 30, 2025

timmoon10 commented Jan 30, 2025

sanandaraj5597 commented Jan 31, 2025

timmoon10 commented Jan 31, 2025

Support store_param_remainders feature from Apex in TE Fused Adam #1408

Support store_param_remainders feature from Apex in TE Fused Adam #1408

Conversation

sanandaraj5597 commented Jan 13, 2025

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanandaraj5597 Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

MaciejBalaNV commented Jan 17, 2025

Choose a reason for hiding this comment

MaciejBalaNV commented Jan 30, 2025 • edited Loading

sanandaraj5597 commented Jan 30, 2025

timmoon10 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timmoon10 commented Jan 30, 2025

sanandaraj5597 commented Jan 31, 2025

timmoon10 commented Jan 31, 2025

Support `store_param_remainders` feature from Apex in TE Fused Adam #1408

Support `store_param_remainders` feature from Apex in TE Fused Adam #1408

sanandaraj5597 Jan 30, 2025 •

edited

Loading

MaciejBalaNV commented Jan 30, 2025 •

edited

Loading