Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter mismatch when merging #2277

Open
6 of 8 tasks
teachsheryl opened this issue Jan 22, 2025 · 3 comments
Open
6 of 8 tasks

Adapter mismatch when merging #2277

teachsheryl opened this issue Jan 22, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@teachsheryl
Copy link

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

When merging adapter with base model, the token embedding size should match naturally

Current behaviour

Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([151666, 2048]) from checkpoint, the shape in current model is torch.Size([151936, 2048]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([151666, 2048]) from checkpoint, the shape in current model is torch.Size([151936, 2048]).

After the recent update (not sure on which), there is a mismatch on certain adapters when merging with base/instruct model. This did not happen previously.

Steps to reproduce

Config yaml

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.11

axolotl branch-commit

main

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@teachsheryl teachsheryl added the bug Something isn't working label Jan 22, 2025
@NanoCode012
Copy link
Collaborator

Hey, thanks for the report. Could you provide more details, such as which model? How did you run it? A sample config would also help.

copying a param with shape torch.Size([151666, 2048]) from checkpoint, the shape in current model is torch.Size([151936, 2048]).

This seems like the checkpoint has less tokens than the current model. Are you pointing to the right adapter?

@laurenhall
Copy link

laurenhall commented Jan 27, 2025

Hi there, I'm encountering the same problem. It seems that Axolotl is resizing the embedding layer during training for some reason.

/usr/local/lib/python3.11/dist-packages/peft/utils/save_and_load.py:260: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.

Model is Qwen/Qwen2.5-14B-Instruct and I am using the tokenizer default chat template (not adding any new special tokens), and am not targeting the embeddings/lm_head layers.

# LoRA
adapter: qlora
lora_model_dir:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.125
lora_target_linear: 
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_modules_to_save:

Using for example:
accelerate launch -m axolotl.cli.train my-qwen-test.yml
python -m axolotl.cli.merge_lora my-qwen-test.yml

(aka, it's definitely the same adapter that was just trained on the same config).
Produces a similar size mismatch between the adapter and the model it was just trained on.

Edit: I checked my training history and it looks like I was able to train and merge this same model base successfully around January 9-10th.

@NanoCode012
Copy link
Collaborator

@laurenhall , thanks for the report. I did a run for both qlora & lora on Qwen/Qwen2.5-7B-Instruct and was able to train+merge successfully. Is there more info you could provide?

For reference, I'm using https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/qwen/qlora.yml as base, just changing base_model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants