[Pytorch] Decoupling framework extensions from common module #1498

KshitijLakhani · 2025-02-20T03:45:01Z

Description

Motivation: Currently, framework extensions has a dependency on transformer_engine/common (libtransformer_engine.so) which is met via the symbols exposed by libtransformer_engine.version. However, this could result in two problems:

Over exposure of internal symbols from transformer_engine to the user
Possible compatibility issues if a different compiler / compiler version is used for building the .so and building the framework extensions code

This PR is the first in a series of decoupling PRs to follow

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Files in the framework extensions identified to have this dependency in the first pass are -

In pytorch/csrc/extensions:
attention.cu, gemm.cpp, multi_tensor_adam.cu

In jax/csrc/extensions:
attention.cpp, utils.cu

This PR only decouples pytorch/csrc/extensions/attention.cu by :

Using nvte_* calls to replace the usage of transformer_engine::Tensor
Moving the structs in t_e/common/fused_attn/thd_util.h to a new file t_e/pytorch/csrc/thd_util.cuh
Templatizing thd_partition_indices_kernel and thd_read_half_tensor_kernel so that they are "re-compiled" on instantiation in the framework extensions

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

KshitijLakhani · 2025-02-20T05:05:58Z

/te-ci pytorch

transformer_engine/common/fused_attn/thd_utils.h

KshitijLakhani · 2025-02-20T11:13:05Z

transformer_engine/pytorch/csrc/extensions/attention.cu

 #include "common/fused_attn/thd_utils.h"
 #include "extensions.h"
-
-using namespace transformer_engine::fused_attn;
+#include "thd_util.cuh"


This is the new header file created in the framework extensions for the structs that were being used in attention.cu from transformer_engine/common/fused_attn/thd_utils.h.

KshitijLakhani · 2025-02-20T12:22:50Z

transformer_engine/pytorch/csrc/thd_util.cuh

@@ -0,0 +1,59 @@
+/*************************************************************************


Is transformer_engine/common/include/transformer_engine a better place for this ?

cyanguwa · 2025-02-20T19:21:08Z

/te-ci pytorch

transformer_engine/common/fused_attn/thd_utils.h

transformer_engine/pytorch/csrc/thd_util.cuh

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

…el kernels ONLY for invoking recompilation and not directly using the pre-compiled symbols in libtransformer.so Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

… common.h Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

…de header Code cleanup Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

ksivaman · 2025-02-21T04:46:43Z

/te-ci pytorch

cyanguwa · 2025-02-21T17:00:35Z

transformer_engine/pytorch/csrc/thd_utils.cuh

@@ -33,39 +86,78 @@ __forceinline__ __device__ int binary_search(int target, int *array, int len) {
 /***************************************************************************************************
 * Support THD format for Context Parallel: Generate partitioned indices for input tokens
 **************************************************************************************************/
-
+// Templatizing this kernel ONLY so that when it is used in framework
+// extensions, it is dynamically compiled


We could probably remove these comments now.

cyanguwa · 2025-02-21T17:04:44Z

transformer_engine/pytorch/csrc/thd_utils.cuh

 #include <cuda.h>
 #include <cuda_bf16.h>
+/***************************************************************************************************
+ * Support THD format for Context Parallel: softmax_lse related operations
+ **************************************************************************************************/


Maybe have a cleanup of these comments. They seem unrelated to the functions below. Same for the Gradients ones.

…d_read_half_tensor_kernel Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

for more information, see https://pre-commit.ci

KshitijLakhani self-assigned this Feb 20, 2025

KshitijLakhani force-pushed the klakhani/maint/decouple-te-in-fwexten branch from 99871f3 to cd20471 Compare February 20, 2025 04:19

KshitijLakhani force-pushed the klakhani/maint/decouple-te-in-fwexten branch from 7750ea4 to c62be40 Compare February 20, 2025 10:40

KshitijLakhani commented Feb 20, 2025

View reviewed changes

transformer_engine/common/fused_attn/thd_utils.h Show resolved Hide resolved

KshitijLakhani commented Feb 20, 2025

View reviewed changes

KshitijLakhani requested review from ptrendx and cyanguwa February 20, 2025 11:57

KshitijLakhani commented Feb 20, 2025

View reviewed changes

ptrendx reviewed Feb 20, 2025

View reviewed changes

transformer_engine/common/fused_attn/thd_utils.h Outdated Show resolved Hide resolved

ptrendx reviewed Feb 20, 2025

View reviewed changes

transformer_engine/common/fused_attn/thd_utils.h Outdated Show resolved Hide resolved

cyanguwa reviewed Feb 20, 2025

View reviewed changes

transformer_engine/pytorch/csrc/thd_util.cuh Outdated Show resolved Hide resolved

KshitijLakhani force-pushed the klakhani/maint/decouple-te-in-fwexten branch from adf5d8a to 56a8500 Compare February 21, 2025 04:38

KshitijLakhani and others added 6 commits February 20, 2025 20:40

Remove dependency on transformer_engine::Tensor in attention.cu

e886651

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Templatize thd_partition_indices_kernel and thd_read_half_tensor_kern…

2ee6895

…el kernels ONLY for invoking recompilation and not directly using the pre-compiled symbols in libtransformer.so Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Modify attention.cu for thd templatized kernels. Remove dependency on…

d61603c

… common.h Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Move thd structs from libtransformer.so to framework extensions inclu…

f72de13

…de header Code cleanup Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

64a5cff

for more information, see https://pre-commit.ci

Consolidate and move thd_utils from common to framework extensions

9edbd5a

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

KshitijLakhani force-pushed the klakhani/maint/decouple-te-in-fwexten branch from 40c3120 to bd6182e Compare February 21, 2025 04:40

KshitijLakhani changed the title ~~Decoupling framework extensions from common module~~ [Pytorch]Decoupling framework extensions from common module Feb 21, 2025

KshitijLakhani changed the title ~~[Pytorch]Decoupling framework extensions from common module~~ [Pytorch] Decoupling framework extensions from common module Feb 21, 2025

cyanguwa reviewed Feb 21, 2025

View reviewed changes

Remove template decorators around thd_partition_indices_kernel and th…

08b82a0

…d_read_half_tensor_kernel Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

KshitijLakhani force-pushed the klakhani/maint/decouple-te-in-fwexten branch from d32096e to 08b82a0 Compare February 21, 2025 18:40

[pre-commit.ci] auto fixes from pre-commit.com hooks

f828f90

for more information, see https://pre-commit.ci

KshitijLakhani requested a review from ptrendx February 21, 2025 18:41

KshitijLakhani requested a review from cyanguwa February 21, 2025 18:41

KshitijLakhani marked this pull request as ready for review February 21, 2025 18:41

cyanguwa approved these changes Feb 21, 2025

View reviewed changes

ptrendx approved these changes Feb 21, 2025

View reviewed changes

KshitijLakhani merged commit 7f2dcf9 into NVIDIA:main Feb 22, 2025
11 of 12 checks passed

KshitijLakhani deleted the klakhani/maint/decouple-te-in-fwexten branch February 22, 2025 05:54

KshitijLakhani mentioned this pull request Feb 26, 2025

Export only necessary symbols from libtransformer_engine.so #1511

Draft

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pytorch] Decoupling framework extensions from common module #1498

[Pytorch] Decoupling framework extensions from common module #1498

KshitijLakhani commented Feb 20, 2025 •

edited

Loading

KshitijLakhani commented Feb 20, 2025

KshitijLakhani Feb 20, 2025

KshitijLakhani Feb 20, 2025

cyanguwa commented Feb 20, 2025

ksivaman commented Feb 21, 2025

cyanguwa Feb 21, 2025

cyanguwa Feb 21, 2025

		@@ -0,0 +1,59 @@
		/*************************************************************************

[Pytorch] Decoupling framework extensions from common module #1498

[Pytorch] Decoupling framework extensions from common module #1498

Conversation

KshitijLakhani commented Feb 20, 2025 • edited Loading

Description

Type of change

Changes

Checklist:

KshitijLakhani commented Feb 20, 2025

KshitijLakhani Feb 20, 2025

Choose a reason for hiding this comment

KshitijLakhani Feb 20, 2025

Choose a reason for hiding this comment

cyanguwa commented Feb 20, 2025

ksivaman commented Feb 21, 2025

cyanguwa Feb 21, 2025

Choose a reason for hiding this comment

cyanguwa Feb 21, 2025

Choose a reason for hiding this comment

KshitijLakhani commented Feb 20, 2025 •

edited

Loading