-
Notifications
You must be signed in to change notification settings - Fork 4.3k
[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz) #7130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…secondary partitions (hpz)
@cyr0930, thanks for this PR. Can you provide more context about the failure this fixes? For example, did you encounter convergence issues after checkpoint loading? |
Partitioned parameters are updated only when ds_secondary_partition_tensor is None by this commit (#4906). For now, after parameter initialization, ds_secondary_partition_tensors are created and existed for each params, so they are not updated when we perform state_dict loading or embedding resizing. |
@cyr0930, apologies for the delay on this. My understanding is the |
I'm not sure because this logic is a bit complicated, but IMO, while HfArgumentParser.parse_args_into_dataclasses is executed deepspeed zero3 is enabled by this (https://github.com/huggingface/transformers/blob/v4.51.2/src/transformers/training_args.py#L2046). And while load model by from_pretrained method, deepspeed.zero.Init context is introduced by this (https://github.com/huggingface/transformers/blob/v4.51.2/src/transformers/modeling_utils.py#L3727). This context wrapped initialization of modules, so parameters are converted to deepspeed parameters during initialization in here (https://github.com/deepspeedai/DeepSpeed/blob/master/deepspeed/runtime/zero/partition_parameters.py#L1107). That's what I get from debugging for now. |
@cyr0930, thanks for sharing this information and for debugging. As this is a critical part of hpz, can you please share your repro steps with me so I can try on my side? |
This is a minimal reproducing code I can make. deepspeed_init.py
|
After this commit (#4906), secondary partitioned tensors are updated only after optimizer.step().
When loading state_dict or resizing embedding after init, secondary partitioned tensors should be updated.
e.g., https://github.com/huggingface/transformers/blob/1c4b62b219323a31011bac3bd3cece7675d9e4c3/src/transformers/integrations/deepspeed.py#L344