Instruction token masking [WIP] #1

SilverSulfide · 2025-01-15T14:31:12Z

DO NOT MERGE

Testing
Does ZerO, model and pipe parallelism work?
Efficiency?
Define marker token, instead of hardcoding

Fix minor issues

Disable row-parallelism for now

…927) * [bug-fix] enable finetuning option(set optimizer params correctly) * change load_checkpoint --------- Co-authored-by: logan.eo <[email protected]>

[Bug] Make Configs Consistent

* fix list[tensor] typing in both scripts * Update NeoXArgs docs automatically * add bf16 saving to conversion scripts * make precision check more complex for v1.0 * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: haileyschoelkopf <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]>

remove password based login for root

* add bf16 configuration Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * pre commit Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Rework deriving precision Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Belt and suspenders Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Make the default setup (of only using fp16 dict) work Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Got rid of bf16 argument Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Re-add detailed bf16 message * Update NeoXArgs docs automatically * Remove unused import * Update NeoXArgs docs automatically * remove useless newline * Update NeoXArgs docs automatically * re-add detailed bf16 message to deepspeed_args * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

…ode exec

* update torch and cuda * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

Remove duplicate deepspeed config and allow forced multinode

* Pre-commit Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Do not check for overflow if not using fp16 Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]>

…arding (#907) * added HF to NeoX 2.0 conversion script with mp and pp sharding * (1) added missing curly brace to pythial/1-4B config; (2) fixed a bug related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention --------- Co-authored-by: Quentin Anthony <[email protected]>

* remove row parallelism * Update NeoXArgs docs automatically --------- Co-authored-by: Quentin-Anthony <[email protected]> Co-authored-by: github-actions <[email protected]>

…arguments (#948) * base64 encode the megatron config as well Signed-off-by: Dashiell Stander <[email protected]> * base64 encode the megatron config as well Signed-off-by: Dashiell Stander <[email protected]> * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

Fix yml error

* added a simple script for multi-node data preparation. * added a simple script for multi-node data preparation. * fixed minor bugs regarding prefixing of the .bin and .idx files * fixed minor bugs regarding prefixing of the .bin and .idx files * fixed minor bugs regarding prefixing of the .bin and .idx files

…heck (#959) * update conversion script instructions in readme * rename v1.0 script (now default for 2.0) to module_to_hf * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]>

* added HF to NeoX 2.0 conversion script with mp and pp sharding * (1) added missing curly brace to pythial/1-4B config; (2) fixed a bug related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention * fill in minimal possible mask values * initialize tensor on the target device --------- Co-authored-by: Quentin Anthony <[email protected]>

* added HF to NeoX 2.0 conversion script with mp and pp sharding * (1) added missing curly brace to pythial/1-4B config; (2) fixed a bug related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention * added GeLU fast for HF model, added barriers to enable conversion across multiple nodes, removed partially hardcoded pythia model name * commented unecessary logging and timers --------- Co-authored-by: Quentin Anthony <[email protected]>

* add an optional `label` field passed in parallel with training data. * minor fix; Add doc * fix * fix data can be None * prevent loading optimizer * add script * Remove some print() stmts, make mask documentation clearer * Add documentation for preprocess_data_with_mask.py --------- Co-authored-by: Hailey Schoelkopf <[email protected]>

* - Add KTO Post-training example * fix reward not finalizing

* readded RM training removed during merge conflict in KTO * - parallel output updated

…283) * preliminary epoch setting * first working iteration * train_epochs_special_case * handle flags * fix bugs * working single path case * working multi-path without eval * remove unused files * additional checks * remove print statement * apply precommit * add lr_decay_fraction * spelling --------- Co-authored-by: Quentin Anthony <[email protected]>

* hotfix * precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

* add asserts and fix post training readme * precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

* fix typo * fix neoxargs usage test * skip conversion test due to multiprocessing issue * precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

* Add ERROR logging prefix and sort alphabetically * fix comment

- do not create a fake head dim and split the 'mixed_x_layer' into QKV layers directly.

…ype' option was removed (#1309) * fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed * config adjustments for llama and gated activations * pre-commit --------- Co-authored-by: jahatef <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Python 3.10 support In this issue Python 3.10 support was added EleutherAI/gpt-neox#1122 * update wording on torch and python --------- Co-authored-by: Quentin Anthony <[email protected]>

…318)

* adds pyproject files and tests * formatting and add dev packages to dev req files * improve req testing --------- Co-authored-by: Quentin Anthony <[email protected]>

StellaAthena and others added 30 commits May 2, 2023 08:11

Merge pull request #917 from EleutherAI/fix-minor

dee7528

Fix minor issues

Merge branch 'main' into disable-row-parallel

5d2d78a

Update NeoXArgs docs automatically

d47a207

Merge pull request #915 from EleutherAI/disable-row-parallel

b608043

Disable row-parallelism for now

[Bug] Make Configs Consistent

9900071

[bug-fix] enable finetuning option(set optimizer params correctly) (#…

befd133

…927) * [bug-fix] enable finetuning option(set optimizer params correctly) * change load_checkpoint --------- Co-authored-by: logan.eo <[email protected]>

fix bug for flash attention (#910)

dc05783

[Fix] pre-commit and update-documentation checks

a4e9f24

Merge pull request #928 from austinburnett/bug/consistentConfigs

3719533

[Bug] Make Configs Consistent

remove password based login for root

b192e18

Update NeoXArgs docs automatically

5c4b51b

Merge pull request #936 from EleutherAI/remove-default-password

9a18727

remove password based login for root

Remove duplicate deepspeed config arg and allow users to force multin…

b130d58

…ode exec

Update NeoXArgs docs automatically

b8cbc7d

update torch and cuda (#937)

162ea36

* update torch and cuda * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

Merge branch 'main' into force_multi

21a43a2

Merge pull request #938 from EleutherAI/force_multi

1a43a58

Remove duplicate deepspeed config and allow forced multinode

Remove row parallelism (#946)

649c309

* remove row parallelism * Update NeoXArgs docs automatically --------- Co-authored-by: Quentin-Anthony <[email protected]> Co-authored-by: github-actions <[email protected]>

Fix yml error

a6e22cc

Merge pull request #955 from xu-song/patch-4

3355142

Fix yml error

gcaillaut and others added 26 commits September 24, 2024 12:21

Do not fail when git is not installed (#1280)

f5d7ff9

Add KTO Post-training example (#1294)

020ce55

* - Add KTO Post-training example * fix reward not finalizing

readded RM training removed during merge conflict in KTO (#1295)

b0b490d

* readded RM training removed during merge conflict in KTO * - parallel output updated

improve profiling docs (#1298)

c1105de

hotfix for tp >= 2 and pp > 2 in autoitercount (#1296)

774eb58

* hotfix * precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

Add additional asserts and update post training readme (#1300)

c8f7b56

* add asserts and fix post training readme * precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

Fix failling tests (#1301)

3272032

* fix typo * fix neoxargs usage test * skip conversion test due to multiprocessing issue * precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

Add ERROR logging prefix and sort the prefixes alphabetically (#1308)

59a5236

* Add ERROR logging prefix and sort alphabetically * fix comment

fix a GQA issue (#1314) (#1315)

96c242e

- do not create a fake head dim and split the 'mixed_x_layer' into QKV layers directly.

Python 3.10 support (#1313)

46afedf

* Python 3.10 support In this issue Python 3.10 support was added EleutherAI/gpt-neox#1122 * update wording on torch and python --------- Co-authored-by: Quentin Anthony <[email protected]>

Fix documentation for converting SFT/DPO weights back to HF Llama (#1…

6552654

…318)

fix bug (#1311)

797a4ab

Add support for dropout in sparse attention (#1312)

50e74cd

adds pyproject files and tests (#1302)

a8f7913

* adds pyproject files and tests * formatting and add dev packages to dev req files * improve req testing --------- Co-authored-by: Quentin Anthony <[email protected]>

random changes

36e744f

Fix error in if statement. Also some random changes.

b758735

Remove debug output from argument parsing.

cd595cd

Remove debug output from megatron/training.py

69f31df

Fixes to convert_neox_to_hf.py conversion script

114b14a

Remove debug output from train.py

0c731b3

Make script that tests model during training.

1f6750a

Add example sbatch script and .yml file

0aded93

Added changes

d845132

instruction token masking

11281c3

SilverSulfide requested review from pmarcis and tomsbergmanis January 15, 2025 14:31

SilverSulfide closed this Aug 20, 2025

SilverSulfide force-pushed the main branch from 9151e5a to a623f62 Compare August 20, 2025 07:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Instruction token masking [WIP] #1

Instruction token masking [WIP] #1

SilverSulfide commented Jan 15, 2025

Uh oh!

Uh oh!

Instruction token masking [WIP] #1

Instruction token masking [WIP] #1

Conversation

SilverSulfide commented Jan 15, 2025

Uh oh!

Uh oh!