Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default gc_interval=100 for llama and nemotron #402

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions launcher_scripts/conf/peft/llama/sft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ model:
attention_dropout: 0.0
ffn_dropout: 0.0

gc_interval: 100

peft:
peft_scheme: null # null (SFT, no PEFT), ptuning, lora
restore_from_path: null
Expand Down
2 changes: 2 additions & 0 deletions launcher_scripts/conf/peft/nemotron/sft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ model:
attention_dropout: 0.0
ffn_dropout: 0.0

gc_interval: 100

peft:
peft_scheme: null # null (SFT, no PEFT), ptuning, lora
restore_from_path: null
Expand Down
1 change: 1 addition & 0 deletions launcher_scripts/conf/training/llama/llama2_13b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ model:
tp_comm_atomic_ag: False
tp_comm_atomic_rs: False
use_flash_attention: true
gc_interval: 100
nsys_profile:
enabled: False
trace: [nvtx,cuda]
Expand Down
1 change: 1 addition & 0 deletions launcher_scripts/conf/training/llama/llama2_7b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ model:
tp_comm_atomic_ag: False
tp_comm_atomic_rs: False
use_flash_attention: true
gc_interval: 100
nsys_profile:
enabled: False
trace: [nvtx,cuda]
Expand Down
1 change: 1 addition & 0 deletions launcher_scripts/conf/training/nemotron/nemotron_15b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ model:
ub_tp_comm_overlap: True
tp_comm_atomic_ag: False
tp_comm_atomic_rs: False
gc_interval: 100

nsys_profile:
enabled: False
Expand Down
3 changes: 2 additions & 1 deletion launcher_scripts/conf/training/nemotron/nemotron_340b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ model:
fp8_amax_compute_algo: max # 'most_recent' or 'max'. Algorithm for computing amax from history
fp8_wgrad: True
ub_tp_comm_overlap: False
gc_interval: 100

optim:
name: distributed_fused_adam
Expand Down Expand Up @@ -188,4 +189,4 @@ model:
- .0333
- ${data_dir}/my-nemotron_00_text_document
- .0333
- ${data_dir}/my-nemotron_00_text_document
- ${data_dir}/my-nemotron_00_text_document
1 change: 1 addition & 0 deletions launcher_scripts/conf/training/nemotron/nemotron_4b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ model:
ub_tp_comm_overlap: true
tp_comm_atomic_ag: False
tp_comm_atomic_rs: False
gc_interval: 100

nsys_profile:
enabled: False
Expand Down
1 change: 1 addition & 0 deletions launcher_scripts/conf/training/nemotron/nemotron_8b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ model:
ub_tp_comm_overlap: true
tp_comm_atomic_ag: False
tp_comm_atomic_rs: False
gc_interval: 100

nsys_profile:
enabled: False
Expand Down
Loading