Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning #258

Open
sjay8 opened this issue Aug 22, 2024 · 2 comments
Open

Finetuning #258

sjay8 opened this issue Aug 22, 2024 · 2 comments

Comments

@sjay8
Copy link

sjay8 commented Aug 22, 2024

Hi! I'm a beginner to all of this. Can someone direct me how to finetune the v3 model? I saw #99 on how to structure the dataset https://github.com/MeetKai/functionary/blob/main/tests/test_case_v2.json

but not sure how exactly to begin the process of fine-tuning. Do I have to runs the scripts located here: https://github.com/MeetKai/functionary/tree/main/functionary/train?

@khai-meetkai
Copy link
Collaborator

Hi @sjay8, yes you can follow the readme in https://github.com/MeetKai/functionary/tree/main/functionary/train, although it has been frequently updated. You might need to upgrade to newest accelerate version:
pip install --upgrade accelerate
And in your training command, to use v3 model, please pass: --prompt_template_version VERSION
The VERSION can be:

v3-llama3.1
v3.llama3
v2.llama3

...
But I recommend using v3.llama3 or v3-llama3.1
Another thing is you can use --packing to pack training data points if you have a lot of training data. If your data contains mostly short examples, the packing ratio might be very big, for example from 20k data points --> 2k data points, this will result in small number of training steps --> the model parameters will not be well updated.
You can also set: --max_packed_size to control the number of data points after packing. If max_packed_size=4, this mean from 20k original data points, the number of data points after packing is not smaller than 20k/4.

From our experience, the number of training steps should be >= 500, the learning rate is ~ 5e-6 for 70B models and 8e-6 --> 1e-5 for 7B models

@danilpavlov
Copy link

@khai-meetkai will you fine-tune llama3.3 in near future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants