Skip to content

Instruction token masking [WIP] #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2,317 commits into from
Closed

Instruction token masking [WIP] #1

wants to merge 2,317 commits into from

Conversation

SilverSulfide
Copy link
Contributor

DO NOT MERGE

  • Testing
  • Does ZerO, model and pipe parallelism work?
  • Efficiency?
  • Define marker token, instead of hardcoding

StellaAthena and others added 30 commits May 2, 2023 08:11
…927)

* [bug-fix] enable finetuning option(set optimizer params correctly)

* change load_checkpoint

---------

Co-authored-by: logan.eo <[email protected]>
* fix list[tensor] typing in both scripts

* Update NeoXArgs docs automatically

* add bf16 saving to conversion scripts

* make precision check more complex for v1.0

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: haileyschoelkopf <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Hailey Schoelkopf <[email protected]>
* add bf16 configuration

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* pre commit

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Rework deriving precision

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Belt and suspenders

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Make the default setup (of only using fp16 dict) work

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Got rid of bf16 argument

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* Re-add detailed bf16 message

* Update NeoXArgs docs automatically

* Remove unused import

* Update NeoXArgs docs automatically

* remove useless newline

* Update NeoXArgs docs automatically

* re-add detailed bf16 message to deepspeed_args

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* update torch and cuda

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
Remove duplicate deepspeed config and allow forced multinode
* Pre-commit

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Do not check for overflow if not using fp16

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
…arding (#907)

* added HF to NeoX 2.0 conversion script with mp and pp sharding

* (1) added missing curly brace to pythial/1-4B config; (2) fixed a bug related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention

---------

Co-authored-by: Quentin Anthony <[email protected]>
* remove row parallelism

* Update NeoXArgs docs automatically

---------

Co-authored-by: Quentin-Anthony <[email protected]>
Co-authored-by: github-actions <[email protected]>
…arguments (#948)

* base64 encode the megatron config as well

Signed-off-by: Dashiell Stander <[email protected]>

* base64 encode the megatron config as well

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* added a simple script for multi-node data preparation.

* added a simple script for multi-node data preparation.

* fixed minor bugs regarding prefixing of the .bin and .idx files

* fixed minor bugs regarding prefixing of the .bin and .idx files

* fixed minor bugs regarding prefixing of the .bin and .idx files
…heck (#959)

* update conversion script instructions in readme

* rename v1.0 script (now default for 2.0) to module_to_hf

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* added HF to NeoX 2.0 conversion script with mp and pp sharding

* (1) added missing curly brace to pythial/1-4B config; (2) fixed a bug related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention

* fill in minimal possible mask values

* initialize tensor on the target device

---------

Co-authored-by: Quentin Anthony <[email protected]>
* added HF to NeoX 2.0 conversion script with mp and pp sharding

* (1) added missing curly brace to pythial/1-4B config; (2) fixed a bug related to a hardcoded value withing the conversion script (3) fixed possible bugs in the conversion script wrt the mp sharding convention

* added GeLU fast for HF model, added barriers to enable conversion across multiple nodes, removed partially hardcoded pythia model name

* commented unecessary logging and timers

---------

Co-authored-by: Quentin Anthony <[email protected]>
* add an optional `label` field passed in parallel with training data.

* minor fix; Add doc

* fix

* fix data can be None

* prevent loading optimizer

* add script

* Remove some print() stmts, make mask documentation clearer

* Add documentation for preprocess_data_with_mask.py

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>
gcaillaut and others added 26 commits September 24, 2024 12:21
* - Add KTO Post-training example

* fix reward not finalizing
* readded RM training removed during merge conflict in KTO

* - parallel output updated
…283)

* preliminary epoch setting

* first working iteration

* train_epochs_special_case

* handle flags

* fix bugs

* working single path case

* working multi-path without eval

* remove unused files

* additional checks

* remove print statement

* apply precommit

* add lr_decay_fraction

* spelling

---------

Co-authored-by: Quentin Anthony <[email protected]>
* hotfix

* precommit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* add asserts and fix post training readme

* precommit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* fix typo

* fix neoxargs usage test

* skip conversion test due to multiprocessing issue

* precommit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* Add ERROR logging prefix and sort alphabetically

* fix comment
- do not create a fake head dim and split the 'mixed_x_layer' into QKV layers directly.
…ype' option was removed (#1309)

* fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed

* config adjustments for llama and gated activations

* pre-commit

---------

Co-authored-by: jahatef <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Python 3.10 support

In this issue Python 3.10 support was added EleutherAI/gpt-neox#1122

* update wording on torch and python

---------

Co-authored-by: Quentin Anthony <[email protected]>
* adds pyproject files and tests

* formatting and add dev packages to dev req files

* improve req testing

---------

Co-authored-by: Quentin Anthony <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.