Skip to content

Pull requests: aws-samples/awsome-distributed-training

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Remove AWS_OFI_NCCL_VERSION
#911 opened Nov 27, 2025 by pbelevich Loading…
refactor: enhance hostfile_topologify.py readability
#909 opened Nov 26, 2025 by Zhenye-Na Loading…
Fix typo in val_batch_size and remove unused imports
#908 opened Nov 26, 2025 by debbywehner Loading…
Minor Updates for RIG Support with Better UX
#906 opened Nov 23, 2025 by bluecrayon52 Loading…
Expert parallelism benchmarks
#901 opened Nov 19, 2025 by pbelevich Loading…
update nccl-tests.yaml paths
#878 opened Oct 22, 2025 by bluecrayon52 Loading…
Adding nanoVLM sample
#864 opened Sep 25, 2025 by allela-roy Loading…
NeMo 2 Performance instructions
#812 opened Aug 5, 2025 by pbelevich Loading…
delete users script in hyperpod
#807 opened Aug 4, 2025 by cszhz Loading…
Feature/slinky slurm hyperpod eks
#804 opened Aug 1, 2025 by bdaqiq01 Loading…
adding nemo2.0 eks test case
#688 opened May 21, 2025 by KeitaW Draft
Feat/ddp mlflow enhancement New feature or request
#655 opened Apr 28, 2025 by KeitaW Loading…
add tips to force NCCL comm to go through EFA
#531 opened Jan 23, 2025 by KeitaW Loading…
Update bionemo test case + propose to subdirectories per orchastrator documentation Improvements or additions to documentation
#396 opened Aug 5, 2024 by KeitaW Draft
ProTip! Type g i on any issue or pull request to go back to the issue listing page.