forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 0
Fetch from nvidia Megatron-LM #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
RaymondLi0
wants to merge
5,152
commits into
ElementAI:load-iter
Choose a base branch
from
NVIDIA:main
base: load-iter
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit d87ba91.
ci: Run on multiple clusters See merge request ADLR/megatron-lm!3292
ci: Allow specific TE-ref See merge request ADLR/megatron-lm!3302
ci(fix): Write logs to log_dir See merge request ADLR/megatron-lm!3299
Address dist checkpointing PyT 24.08 failure See merge request ADLR/megatron-lm!3253
ci(hotfix): Downstream pipeline See merge request ADLR/megatron-lm!3307
…nal argparse flag to clear GPU... Co-authored-by: Szymon Migacz <[email protected]>
MR feedback: added units for arguments, optional argparse flag to clear GPU... See merge request ADLR/megatron-lm!3308
…mamba class constructor Co-authored-by: Zhiyu Li <[email protected]>
Allow process group as optional argument for mamba class constructor See merge request ADLR/megatron-lm!2966
Add NVTX ranges to categorize execution See merge request ADLR/megatron-lm!2588
Move fsdp 2 import from _composable to public See merge request ADLR/megatron-lm!3116
ci: Add nemo-image to `ci-rebuild-mcore-nemo-image` See merge request ADLR/megatron-lm!3321
ci: Re-enable tests that failed on memory See merge request ADLR/megatron-lm!3197
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]>
Engine updates See merge request ADLR/megatron-lm!3254
Co-authored-by: Mcore Bot <[email protected]>
ci: Onboard mr-slim to h100 See merge request ADLR/megatron-lm!3312
…s like 'pre_wd_mult' instead of 'wd_mult'
Quick fix for NeMo: handle alternate key names like 'pre_wd_mult' instead of 'wd_mult' See merge request ADLR/megatron-lm!3444
chore: Bump version 0.14.0 See merge request ADLR/megatron-lm!3477
Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]>
Added offloading support for MCore layers See merge request ADLR/megatron-lm!3071
… avoid shuffling of new tokens Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]>
Bug fix to reset kv chunks assigned to -1 and avoid shuffling of new tokens See merge request ADLR/megatron-lm!3437
chore: Add init to tools See merge request ADLR/megatron-lm!3483
Fix unit test test_fp8_param.py blockwise scaling See merge request ADLR/megatron-lm!3480
chore: Add init to examples See merge request ADLR/megatron-lm!3492
build: Force pin down setuptools See merge request ADLR/megatron-lm!3493
Pad input tensors and enable fp8 weights for fp8 inference See merge request ADLR/megatron-lm!3341
…l Communication Grid for Model Parallelism Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Mcore Bot <[email protected]>
M4 Taskforce: Add HyperCommGrid: N-Dimensional Communication Grid for Model Parallelism See merge request ADLR/megatron-lm!3398
Pass strict=False to load_checkpoint in inference See merge request ADLR/megatron-lm!3508
Skip fused rope check if te version < 1.4.0 See merge request ADLR/megatron-lm!3526
Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Guyue Huang <[email protected]> Co-authored-by: Pingtian Li <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]>
ci: Misc refactorings See merge request ADLR/megatron-lm!3529
…t when specifying '--no-load-optim'
Add option to load main params from checkpoint when specifying '--no-load-optim' See merge request ADLR/megatron-lm!3284
Co-authored-by: Yashaswi Karnati <[email protected]> Co-authored-by: Yashaswi Karnati <[email protected]>
MiMO VLM training example and functional tests See merge request ADLR/megatron-lm!3328
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.