Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] fix cross entropy vanishing gradients
#2139 opened Aug 29, 2025 by casper-hansen Loading…
1 of 13 tasks
ci: Build and attach bdist wheels to release page
#2138 opened Aug 29, 2025 by ko3n1g Loading…
13 tasks
[PyTorch Debug] Add max_blockwise_X_dynamic_range stats
#2137 opened Aug 29, 2025 by pggPL Loading…
8 of 13 tasks
[JAX] HighPrecisionTensor wrapper for non-quantized data
#2136 opened Aug 28, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
Fix CUDA version in setup.py
#2132 opened Aug 27, 2025 by vcherepanov-nv Draft
1 of 13 tasks
Create GPU reload buffers on main stream
#2131 opened Aug 27, 2025 by sanandaraj5597 Loading…
FP8 Output Quantization for GEMM
#2123 opened Aug 26, 2025 by vthumbe1503 Draft
6 of 13 tasks
Fused RoPE with combined QKV input.
#2122 opened Aug 26, 2025 by vasunvidia Loading…
1 of 13 tasks
Adds dst.dtype information in copy_ method of quantized tensors.
#2120 opened Aug 26, 2025 by zobeideThePlayer Loading…
3 of 13 tasks
[PyTorch][CUDA Graph] Fix FP8 Weight Quantization Cache under CUDA Graph performance Performance issues
#2119 opened Aug 26, 2025 by zhongbozhu Loading…
13 tasks
[PyTorch Debug] Fix issue with microbatching + debug value caching
#2108 opened Aug 25, 2025 by pggPL Loading…
8 of 13 tasks
[PyTorch Debug] Fix issue with negative underflow.
#2107 opened Aug 25, 2025 by pggPL Loading…
8 of 13 tasks
Fix test of FSDP2 by correcting init logic and applying autocast
#2105 opened Aug 24, 2025 by ntenenz Loading…
4 of 13 tasks
[PyTorch] Let GroupedLinear accept MXFP8 input and gradient
#2099 opened Aug 21, 2025 by yaox12 Loading…
1 of 13 tasks
Support MLA Context Parallel (CP) exchanging latent KV
#2064 opened Aug 12, 2025 by yuzhongw-nvidia Loading…
13 tasks
Feature fast cast-only mxfp8
#2062 opened Aug 12, 2025 by Jianbing-D Loading…
6 of 13 tasks
Support PyPI wheel for cuda13 build Build system
#2057 opened Aug 11, 2025 by ksivaman Loading…
2 of 13 tasks
Add better ordering enforcment to split_overlap_rs gemms.
#2056 opened Aug 11, 2025 by chaseblock Loading…
6 of 13 tasks
[Draft] Add primary weighs fp8 support for mxfp8
#2055 opened Aug 11, 2025 by kunlunl Loading…
13 tasks
ProTip! Exclude everything labeled bug with -label:bug.