Skip to content

About Full Fine-Tuning #21

@lucasliunju

Description

@lucasliunju

Hi, thank you very much for your great repo again!

I would like to use this codebase to conduct the experiments about full-fine tuning. However, I find the results is not very stable, when I try to change the --micro_batch_size, the final results are quite different. In addition, the results on LoRA and MoRA is very stable, even if I change --micro_batch_size. This is my current running command about full fine-tuning:

deepspeed --master_port 29920 --include="localhost:0,1,2,3" train.py \
           --base_model 'meta-llama/Llama-2-7b-hf' --micro_batch_size 4 \
            --wandb_run_name lora_gsm8k_ft_epoch_3_2e5_s11 --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,down_proj,up_proj \
            --num_epochs 3 --deepspeed ds_11.config --wandb_project lora-math --lora_r $RANK --batch_size 128 \
            --data_path meta-math/MetaMath \
            --seed 11 \
            --save_steps 3000 \
            --learning_rate 2e-5 \
            --logging_steps 5 \
            --use_bf16  --use_16bit --full_ft

I also try to run the fine-tuning code on different type of GPUs, such as A100 and H100 and I find the final results of Full FT are also quite different but the results of Lora are very close.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions