-
Notifications
You must be signed in to change notification settings - Fork 497
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] fix cross entropy vanishing gradients
#2139
opened Aug 29, 2025 by
casper-hansen
Loading…
1 of 13 tasks
ci: Build and attach bdist wheels to release page
#2138
opened Aug 29, 2025 by
ko3n1g
Loading…
13 tasks
[PyTorch Debug] Add max_blockwise_X_dynamic_range stats
#2137
opened Aug 29, 2025 by
pggPL
Loading…
8 of 13 tasks
[JAX] HighPrecisionTensor wrapper for non-quantized data
#2136
opened Aug 28, 2025 by
jberchtold-nvidia
Loading…
8 of 13 tasks
[JAX] Fix failing fused attn tests for dropout=0.1 and bias for sm100
#2135
opened Aug 28, 2025 by
KshitijLakhani
Loading…
5 of 13 tasks
Adds context parallelism utilities: moving cp shards to diff ranks and pad sequence to divisibility factory
2.8.0
#2129
opened Aug 27, 2025 by
jomitchellnv
Loading…
5 of 12 tasks
Fix memory overhead of linear layer when all gather from sequence parallel
#2125
opened Aug 27, 2025 by
yuzhongw-nvidia
Loading…
13 tasks
Adds dst.dtype information in copy_ method of quantized tensors.
#2120
opened Aug 26, 2025 by
zobeideThePlayer
Loading…
3 of 13 tasks
[PyTorch][CUDA Graph] Fix FP8 Weight Quantization Cache under CUDA Graph
performance
Performance issues
#2119
opened Aug 26, 2025 by
zhongbozhu
Loading…
13 tasks
[PyTorch Debug] Fix issue with microbatching + debug value caching
#2108
opened Aug 25, 2025 by
pggPL
Loading…
8 of 13 tasks
[PyTorch Debug] Fix issue with negative underflow.
#2107
opened Aug 25, 2025 by
pggPL
Loading…
8 of 13 tasks
Fix test of FSDP2 by correcting init logic and applying autocast
#2105
opened Aug 24, 2025 by
ntenenz
Loading…
4 of 13 tasks
[PyTorch] Fix assertion error message formatting in DotProductAttention
#2103
opened Aug 22, 2025 by
janbernloehr
Loading…
6 of 13 tasks
[PyTorch] Let
GroupedLinear
accept MXFP8 input and gradient
#2099
opened Aug 21, 2025 by
yaox12
Loading…
1 of 13 tasks
Support communication/gemm overlap for [Wgrad->Dgrad] execution order in the bwd pass.
#2065
opened Aug 12, 2025 by
fanshiqing
Loading…
3 of 13 tasks
Support MLA Context Parallel (CP) exchanging latent KV
#2064
opened Aug 12, 2025 by
yuzhongw-nvidia
Loading…
13 tasks
Support PyPI wheel for cuda13
build
Build system
#2057
opened Aug 11, 2025 by
ksivaman
Loading…
2 of 13 tasks
Add better ordering enforcment to split_overlap_rs gemms.
#2056
opened Aug 11, 2025 by
chaseblock
Loading…
6 of 13 tasks
[Draft] Add primary weighs fp8 support for mxfp8
#2055
opened Aug 11, 2025 by
kunlunl
Loading…
13 tasks
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.