-
Notifications
You must be signed in to change notification settings - Fork 447
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Optimize the performance of permute fusion kernels
#1927
opened Jul 4, 2025 by
hxbai
Loading…
7 of 13 tasks
[PyTorch] Fix setting
align_size
when FP8 is not initialized
#1926
opened Jul 4, 2025 by
yaox12
Loading…
1 of 13 tasks
[JAX] Fix grouped GEMM error on CUDA 12.9.1 & later
#1925
opened Jul 3, 2025 by
huanghua1994
Loading…
6 of 13 tasks
Add test for
LayerNormMLP
implementation using te.ops.Sequential
to test_fusible_ops.py
#1924
opened Jul 3, 2025 by
janekb04
Loading…
6 of 13 tasks
[PyTorch] Fuse permute+pad and unpermute+unpad ops for FP8 optimization
#1921
opened Jul 3, 2025 by
xiaoxi-wangfj
Loading…
3 of 12 tasks
Call
pre_(first_)forward
only when global state changes
#1917
opened Jul 1, 2025 by
janekb04
Loading…
7 of 13 tasks
[JAX] Resolve test conflict in JAX helper tests
#1916
opened Jul 1, 2025 by
emmanuel-ferdman
Loading…
6 of 13 tasks
[Common] Optimize KV cache related kernels
#1914
opened Jun 30, 2025 by
cyanguwa
Loading…
8 of 13 tasks
Fix import error when flash attention 3 is installed
#1913
opened Jun 30, 2025 by
HollowMan6
Loading…
7 of 13 tasks
[PyTorch debug] Improve precision debug tools performance
#1909
opened Jun 30, 2025 by
pggPL
Loading…
9 of 13 tasks
[PyTorch debug] Run test_sanity with debug tools enabled.
#1908
opened Jun 30, 2025 by
pggPL
Loading…
7 of 13 tasks
[PyTorch] Support FA3 MLA CP feature
#1907
opened Jun 28, 2025 by
zhujian19891203
Loading…
7 of 13 tasks
[PyTorch Debug] Support log fp8 tensor stats for blockwise recipe
#1905
opened Jun 27, 2025 by
lengerfulluse
Loading…
12 tasks
[common] NVFP4 kernels
enhancement
New feature or request
#1904
opened Jun 27, 2025 by
Oleg-Goncharov
•
Draft
5 of 13 tasks
[JAX] Update distributed LayerNormMLP test tolerance for L40
#1901
opened Jun 26, 2025 by
jberchtold-nvidia
Loading…
8 of 13 tasks
[PyTorch] Tests for loading previously-generated checkpoints
testing
Improvements to tests or testing infrastructure
#1899
opened Jun 26, 2025 by
timmoon10
Loading…
8 of 14 tasks
[PyTorch Debug] More advanced stats for Quantized Tensors
#1897
opened Jun 26, 2025 by
pggPL
Loading…
2 of 13 tasks
Handle dtypes more carefully in multi-tensor Adam
bug
Something isn't working
#1888
opened Jun 17, 2025 by
timmoon10
Loading…
6 of 13 tasks
[PyTorch] Limit max time for distributed PyTorch tests
testing
Improvements to tests or testing infrastructure
#1877
opened Jun 13, 2025 by
timmoon10
Loading…
6 of 14 tasks
[PyTorch] Add save_original_input in Linear/GroupedLinear to save memory
#1865
opened Jun 11, 2025 by
hxbai
Loading…
8 of 13 tasks
Previous Next
ProTip!
no:milestone will show everything without a milestone.