[TLE][feat] Add tle dsa extension#715
Open
huanghaoXcore wants to merge 16 commits into
Open
Conversation
…lagos-ai#399) * [FEAT](tle): WIP - add tle features * [FEAT]: WIP - refactor tle * move tle.ascend to tle.dsa.ascend * move tle_ir to third_party/tle/dsa * reimplement alloc/to_tensor/to_buffer reference to buffer_ir in third_party/ascend * reimplement tle.dsa.ascend scope with address_space in ascend * [FEAT](tle): support add, sub, mul, div, max, min in tle.dsaf * [FIX](tle): fix to_tensor in test_add_vec_mix.py * [FIX] remove memory_space_cast in dsa_to_tensor because the op removes the memory space attribute and result in compiling errors * [TESTING] add collect_single method in ascend/testing.py to preserve the original benchmark statistics * [FEAT](tle) add hint, subview, extract_slice, extrace_element in tle.dsa * [REFACT](tle): decouple tle from TritonOps.td * decouple TleOps from TritonOps and mov to third_party/tle/dsa/dialect * implement the TleOp conversion in third_party/tle/dsa rather than in flir directly, flir just call the conversion in its pass * [CHORE]: update doc in tle * [FEAT]: decouple tle.dsa in backend/ascend/spec * backend/ascend/spec/triton/compiler/code_generator.py still use tle.dsa in its visitor to visit python ast * [FIX]: fix copyright declaration in tle * [FIX](tle): fix extract and apply tle.hints when visit ast * fix tle.dsa.hint for nested usage, see python/test/tle/test_tle_with_hints.py * implement extract_tle in experimental/tle * [FIX](tle): fix tle module importing in ascend/backend/spec/triton/compiler/code_generator.py and add sparse_flash_attn_tle.py * [FIX]: fix copyright declaration in tle * [FIX](tle): remove redundant code and fix code format (cherry picked from commit 837fe3e)
Refactor the TLE DSA build layout so DSA is managed as part of the TLE subtree instead of a separate plugin-style entry point. Keep DSA operations in the main `tle` dialect and avoid defining a second `TleDialect::initialize()`. DSA-specific op and attr registration is now routed through `TleDialect::dsaInitialize()`, with the implementation kept under the DSA IR directory to preserve module separation. Rename the DSA TableGen files to the `TleDSA*` naming scheme and update generated include references accordingly. Adjust CMake dependencies and ordering so DSA TableGen/IR targets are created before the main TLE IR target, `TleIR` links `TleDSAIR`, and DSA conversion is configured after the main TLE IR target is available. Also package DSA Python bindings into the main `TritonTLE` plugin path to keep a single TLE plugin and a single dialect registration path.
* [feat]: enable cvpipeline for sfa * Apply code-format changes --------- Co-authored-by: flagtree-bot <flagtree_ai@163.com> Co-authored-by: zhzhcookie <zhengyang_pku@163.com> (cherry picked from commit 35cc929)
(cherry picked from commit bd8f032)
…verhead (flagos-ai#593) Co-authored-by: 谢昱 <xieyu@xcoresigma.com> (cherry picked from commit b238fdb)
Expose PIPE, block synchronization, and sub-vector helpers through the TLE language namespace to support pipeline-enabled kernels. (cherry picked from commit db7e7ff)
(cherry picked from commit 4c30700)
(cherry picked from commit f6f2d58)
(cherry picked from commit cb91251)
* [TLE] Add tle_swiglu kernel on Ascend NPU Signed-off-by: wangziyi <wangziyi@xcoresigma.com> * [TLE] Add tle_swiglu kernel on Ascend NPU Signed-off-by: wangziyi <wangziyi@xcoresigma.com> --------- Signed-off-by: wangziyi <wangziyi@xcoresigma.com> (cherry picked from commit b522050)
(cherry picked from commit 4c1f9a9)
* [TLE]: add add_rmsnorm_bias kernel in triton/tutorials/tle * [TLE]: rename 06-add-rms-norm-bias.py to 07-add-rms-norm-bias.py * [chore]: fix code format * [ci][fix]: fix conflict in ascend3.2-build-and-test.yml * [fix]: rename add_rms_norm_bias kernel * [fix]: fix for getting vector core nums which requires torch_npu greater than 2.9.0 * [tle][tutorials]: fix for bare except forbidden (cherry picked from commit fefa349)
|
|
9e7af70 to
57b9f02
Compare
Enable TLE struct support for the Ascend build and add the missing include paths needed by the Ascend and TLE Python bindings. Register the TLE dialect in the Ascend IR loader and adjust DSA codegen/semantics for hint scopes, integer constants, max operands, buffer shapes, and static slice metadata. Update the AddRmsNormBias tutorial to avoid calling the DSA parallel marker at runtime and include pre-commit formatting fixes.
57b9f02 to
e8b2104
Compare
a580e30 to
c0d2203
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.