-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[BUGFIX] Save dist_checkpointing metadata on all nodes for multi-node training
#1531
opened Apr 13, 2025 by
Pranaykarvi
Loading…
added fix to avoid overflow with new numpy casting behaviour (Issue: #1519)
#1520
opened Apr 4, 2025 by
Apsod
Loading…
Add full support for Local mode without Apex/TE, and add support for Open XLA on CUDA
#1510
opened Mar 31, 2025 by
ajayvohra2005
Loading…
[BUG]: Updating the logic for reducing the load_balancing_loss during logging, such that the correct value is logged while using CUDA Graphs
#1507
opened Mar 27, 2025 by
arjun-choudhry
Loading…
fix for group_limited_topk: K_r is moe_router_topk instead of moe_router_num_groups
#1502
opened Mar 25, 2025 by
ladyrick
Loading…
[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad
#1495
opened Mar 22, 2025 by
ETOgaosion
Loading…
Fix llama_mistral loader by using args.true_vocab_size
#1491
opened Mar 20, 2025 by
zhuzilin
Loading…
Build dataset for all GPUs with tp_rank=0 and pp_rank=0 or -1 in multi-machine training.
#1480
opened Mar 14, 2025 by
wan-nan
Loading…
Enabling variable_seq_lengths when encoder has Different TP Size
#1470
opened Mar 12, 2025 by
xiaojunjie
Loading…
fix(moe): the missing argument 'router_dtype' of _DeepepManager.__init__
#1463
opened Mar 11, 2025 by
AsakusaRinne
Loading…
Replace deprecated numpy.product with numpy.prod to ensure compatibility with NumPy >=2.0
#1440
opened Feb 27, 2025 by
mustious
Loading…
fix a bug in load balancing loss aggregation when recompute is turned on
#1433
opened Feb 26, 2025 by
lyuwen
Loading…
fix: return float instead of tensor from
get_rotary_seq_len
#1419
opened Feb 20, 2025 by
jasonchiu-codeium
Loading…
Fix document regarding GQA (No activity in 60 days on issue or PR
--group-query-attention
) argument
stale
#1401
opened Feb 12, 2025 by
eagle705
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.