Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] Optimize the performance of permute fusion kernels
#1927 opened Jul 4, 2025 by hxbai Loading…
7 of 13 tasks
[PyTorch] Fix setting align_size when FP8 is not initialized
#1926 opened Jul 4, 2025 by yaox12 Loading…
1 of 13 tasks
[JAX] Fix grouped GEMM error on CUDA 12.9.1 & later
#1925 opened Jul 3, 2025 by huanghua1994 Loading…
6 of 13 tasks
[PyTorch] Fuse permute+pad and unpermute+unpad ops for FP8 optimization
#1921 opened Jul 3, 2025 by xiaoxi-wangfj Loading…
3 of 12 tasks
Call pre_(first_)forward only when global state changes
#1917 opened Jul 1, 2025 by janekb04 Loading…
7 of 13 tasks
[JAX] Resolve test conflict in JAX helper tests
#1916 opened Jul 1, 2025 by emmanuel-ferdman Loading…
6 of 13 tasks
[Common] Optimize KV cache related kernels
#1914 opened Jun 30, 2025 by cyanguwa Loading…
8 of 13 tasks
Fix import error when flash attention 3 is installed
#1913 opened Jun 30, 2025 by HollowMan6 Loading…
7 of 13 tasks
[PyTorch] Optimize create_tensor in quantizer
#1912 opened Jun 30, 2025 by yaox12 Draft
13 tasks
[PyTorch debug] Improve precision debug tools performance
#1909 opened Jun 30, 2025 by pggPL Loading…
9 of 13 tasks
[PyTorch debug] Run test_sanity with debug tools enabled.
#1908 opened Jun 30, 2025 by pggPL Loading…
7 of 13 tasks
[PyTorch] Support FA3 MLA CP feature
#1907 opened Jun 28, 2025 by zhujian19891203 Loading…
7 of 13 tasks
[PyTorch Debug] Support log fp8 tensor stats for blockwise recipe
#1905 opened Jun 27, 2025 by lengerfulluse Loading…
12 tasks
[common] NVFP4 kernels enhancement New feature or request
#1904 opened Jun 27, 2025 by Oleg-Goncharov Draft
5 of 13 tasks
Fix fp8_calibration path
#1903 opened Jun 27, 2025 by sudhakarsingh27 Draft
1 of 13 tasks
[JAX] Update distributed LayerNormMLP test tolerance for L40
#1901 opened Jun 26, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
[PyTorch] Tests for loading previously-generated checkpoints testing Improvements to tests or testing infrastructure
#1899 opened Jun 26, 2025 by timmoon10 Loading…
8 of 14 tasks
[PyTorch Debug] More advanced stats for Quantized Tensors
#1897 opened Jun 26, 2025 by pggPL Loading…
2 of 13 tasks
Handle dtypes more carefully in multi-tensor Adam bug Something isn't working
#1888 opened Jun 17, 2025 by timmoon10 Loading…
6 of 13 tasks
[Pytorch] CP + THD + chunked attention support.
#1887 opened Jun 17, 2025 by pggPL Draft
1 of 13 tasks
pipeline aware cpu offload
#1886 opened Jun 17, 2025 by liuzhenhai93 Loading…
8 tasks done
Tongliu/router fusion
#1883 opened Jun 16, 2025 by Autumn1998 Loading…
13 tasks
[PyTorch] Limit max time for distributed PyTorch tests testing Improvements to tests or testing infrastructure
#1877 opened Jun 13, 2025 by timmoon10 Loading…
6 of 14 tasks
[PyTorch] Add save_original_input in Linear/GroupedLinear to save memory
#1865 opened Jun 11, 2025 by hxbai Loading…
8 of 13 tasks
ProTip! no:milestone will show everything without a milestone.