forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 50
From NVIDIA Megatron-LM for visibility #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
RaymondLi0
wants to merge
4,790
commits into
bigcode-project:multi-query-attention
Choose a base branch
from
NVIDIA:main
base: multi-query-attention
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+230,452
−27,996
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ci: onboard T5 memory test See merge request ADLR/megatron-lm!3225
ci: Provide easier tooling for local runs See merge request ADLR/megatron-lm!3257
…lOpt Linear layer
Remove unintentionally leftover lines in ModelOpt Linear layer See merge request ADLR/megatron-lm!3228
feat: use multi-storage client in checkpointing See merge request ADLR/megatron-lm!2652
ci: Fixes to the release See merge request ADLR/megatron-lm!3263
…s with nemo2 tests
ADLR/megatron-lm!3193 - substitute nemo1 tests with nemo2 tests See merge request ADLR/megatron-lm!3235
Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Hao Wu <[email protected]>
… into 'main' remove from recipe See merge request ADLR/megatron-lm!3270
Fix attention_mask shapes in Attention unit test Closes #464 See merge request ADLR/megatron-lm!3261
Co-authored-by: Santosh Bhavani <[email protected]>
Updated setup instructions in README.md See merge request ADLR/megatron-lm!3210
…robatched inference is on
Disable cudagraphs when pipeline parallel microbatched inference is on See merge request ADLR/megatron-lm!3151
Co-authored-by: oliver könig <[email protected]> Co-authored-by: Mcore Bot <[email protected]>
Inference functional test: 580M Minitron See merge request ADLR/megatron-lm!2812
…ron" This reverts commit f8c8c9c.
Co-authored-by: oliver könig <[email protected]> Co-authored-by: Mcore Bot <[email protected]>
…hanges during inference
Invalidate cached SSM tensors if batch size changes during inference See merge request ADLR/megatron-lm!3277
ci: Move unit test logic to file See merge request ADLR/megatron-lm!3291
…nsor-parallelizable to ensure gradients are correctly all-reduced Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: William Dykas <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: root <[email protected]>
Mark weights from vision encoder to be non-tensor-parallelizable to ensure gradients are correctly all-reduced See merge request ADLR/megatron-lm!3190
Co-authored-by: Zijie Yan <[email protected]>
Granular upcycling implementation See merge request ADLR/megatron-lm!2850
Add GPU energy (and ~power) monitoring for training See merge request ADLR/megatron-lm!3424
…TransformerLayer Submodule Callables Co-authored-by: Zijie Yan <[email protected]>
…to 'main' feat(MoE): Support ep a2a overlap - (01) Add TransformerLayer Submodule Callables See merge request ADLR/megatron-lm!3217
Co-authored-by: Peter Dykas <[email protected]> Co-authored-by: Hongxiao Bai <[email protected]> Co-authored-by: Santosh Bhavani <[email protected]> Co-authored-by: Qiyu Wan <[email protected]> Co-authored-by: Duncan Riach <[email protected]> Co-authored-by: Guyue Huang <[email protected]> Co-authored-by: Kezhi Kong <[email protected]> Co-authored-by: Li Tao <[email protected]> Co-authored-by: Tyler Poon <[email protected]> Co-authored-by: Yu Yao <[email protected]> Co-authored-by: Helen Ngo <[email protected]> Co-authored-by: Mikolaj Blaz <[email protected]> Co-authored-by: Kunlun Li <[email protected]> Co-authored-by: Shunkang Zhang <[email protected]> Co-authored-by: Jakub Szulc <[email protected]> Co-authored-by: Keshav Santhanam <[email protected]> Co-authored-by: Matthieu Le <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Mcore Bot <[email protected]>
build: Switch to uv See merge request ADLR/megatron-lm!3397
build: Simplify nemo image See merge request ADLR/megatron-lm!3468
… engine Co-authored-by: Peter Dykas <[email protected]>
Make completions endpoint use MCore inference engine See merge request ADLR/megatron-lm!3272
Co-authored-by: Mcore Bot <[email protected]>
Implement dist-ckpt content versioning See merge request ADLR/megatron-lm!3420
Co-authored-by: oliver könig <[email protected]>
fix (ckpt): Fix `_extra_state` for TE 2.5 See merge request ADLR/megatron-lm!3451
…stom-FSDP Co-authored-by: jianbinc <[email protected]>
Add Hybrid Shard Data-Parallel Support for Custom-FSDP See merge request ADLR/megatron-lm!3081
…ssues in checkpointing
Revert `fork` to `spawn` based on stability issues in checkpointing See merge request ADLR/megatron-lm!3450
…able quantization configuration Co-authored-by: Simon Layton <[email protected]>
Add kitchen extension with per-layer configurable quantization configuration See merge request ADLR/megatron-lm!3301
Add deprecation warning for legacy inference See merge request ADLR/megatron-lm!3474
…ings to avoid conflicts
Change naming of original_max_position_embeddings to avoid conflicts See merge request ADLR/megatron-lm!3181
…when it fails arg checks
…main' Make cudagraph replay check more descriptive when it fails arg checks See merge request ADLR/megatron-lm!3472
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.