Skip to content

From NVIDIA Megatron-LM for visibility #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4,790 commits into
base: multi-query-attention
Choose a base branch
from

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
ko3n1g and others added 28 commits May 6, 2025 08:47
ci: onboard T5 memory test

See merge request ADLR/megatron-lm!3225
ci: Provide easier tooling for local runs

See merge request ADLR/megatron-lm!3257
Remove unintentionally leftover lines in ModelOpt Linear layer

See merge request ADLR/megatron-lm!3228
feat: use multi-storage client in checkpointing

See merge request ADLR/megatron-lm!2652
ci: Fixes to the release

See merge request ADLR/megatron-lm!3263
ADLR/megatron-lm!3193 - substitute nemo1 tests with nemo2 tests

See merge request ADLR/megatron-lm!3235
Co-authored-by: Mcore Bot <[email protected]>
Co-authored-by: Hao Wu <[email protected]>
… into 'main'

remove from recipe

See merge request ADLR/megatron-lm!3270
Fix attention_mask shapes in Attention unit test

Closes #464

See merge request ADLR/megatron-lm!3261
Updated setup instructions in README.md

See merge request ADLR/megatron-lm!3210
Disable cudagraphs when pipeline parallel microbatched inference is on

See merge request ADLR/megatron-lm!3151
Inference functional test: 580M Minitron

See merge request ADLR/megatron-lm!2812
Invalidate cached SSM tensors if batch size changes during inference

See merge request ADLR/megatron-lm!3277
ci: Move unit test logic to file

See merge request ADLR/megatron-lm!3291
wdykas and others added 30 commits June 14, 2025 10:48
…nsor-parallelizable to ensure gradients are correctly all-reduced

Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: William Dykas <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Mark weights from vision encoder to be non-tensor-parallelizable to ensure gradients are correctly all-reduced

See merge request ADLR/megatron-lm!3190
Granular upcycling implementation

See merge request ADLR/megatron-lm!2850
Add GPU energy (and ~power) monitoring for training

See merge request ADLR/megatron-lm!3424
…TransformerLayer Submodule Callables

Co-authored-by: Zijie Yan <[email protected]>
…to 'main'

feat(MoE): Support ep a2a overlap - (01) Add TransformerLayer Submodule Callables

See merge request ADLR/megatron-lm!3217
Co-authored-by: Peter Dykas <[email protected]>
Co-authored-by: Hongxiao Bai <[email protected]>
Co-authored-by: Santosh Bhavani <[email protected]>
Co-authored-by: Qiyu Wan <[email protected]>
Co-authored-by: Duncan Riach <[email protected]>
Co-authored-by: Guyue Huang <[email protected]>
Co-authored-by: Kezhi Kong <[email protected]>
Co-authored-by: Li Tao <[email protected]>
Co-authored-by: Tyler Poon <[email protected]>
Co-authored-by: Yu Yao <[email protected]>
Co-authored-by: Helen Ngo <[email protected]>
Co-authored-by: Mikolaj Blaz <[email protected]>
Co-authored-by: Kunlun Li <[email protected]>
Co-authored-by: Shunkang Zhang <[email protected]>
Co-authored-by: Jakub Szulc <[email protected]>
Co-authored-by: Keshav Santhanam <[email protected]>
Co-authored-by: Matthieu Le <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Mcore Bot <[email protected]>
build: Switch to uv

See merge request ADLR/megatron-lm!3397
build: Simplify nemo image

See merge request ADLR/megatron-lm!3468
Make completions endpoint use MCore inference engine

See merge request ADLR/megatron-lm!3272
Implement dist-ckpt content versioning

See merge request ADLR/megatron-lm!3420
fix (ckpt): Fix `_extra_state` for TE 2.5

See merge request ADLR/megatron-lm!3451
Add Hybrid Shard Data-Parallel Support for Custom-FSDP

See merge request ADLR/megatron-lm!3081
Revert `fork` to `spawn` based on stability issues in checkpointing

See merge request ADLR/megatron-lm!3450
…able quantization configuration

Co-authored-by: Simon Layton <[email protected]>
Add kitchen extension with per-layer configurable quantization configuration

See merge request ADLR/megatron-lm!3301
Add deprecation warning for legacy inference

See merge request ADLR/megatron-lm!3474
Change naming of original_max_position_embeddings to avoid conflicts

See merge request ADLR/megatron-lm!3181
…main'

Make cudagraph replay check more descriptive when it fails arg checks

See merge request ADLR/megatron-lm!3472
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.