Hi, thank you very much for your great repo again!
I would like to use this codebase to conduct the experiments about full-fine tuning. However, I find the results is not very stable, when I try to change the --micro_batch_size, the final results are quite different. In addition, the results on LoRA and MoRA is very stable, even if I change --micro_batch_size. This is my current running command about full fine-tuning:
deepspeed --master_port 29920 --include="localhost:0,1,2,3" train.py \
--base_model 'meta-llama/Llama-2-7b-hf' --micro_batch_size 4 \
--wandb_run_name lora_gsm8k_ft_epoch_3_2e5_s11 --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,down_proj,up_proj \
--num_epochs 3 --deepspeed ds_11.config --wandb_project lora-math --lora_r $RANK --batch_size 128 \
--data_path meta-math/MetaMath \
--seed 11 \
--save_steps 3000 \
--learning_rate 2e-5 \
--logging_steps 5 \
--use_bf16 --use_16bit --full_ft
I also try to run the fine-tuning code on different type of GPUs, such as A100 and H100 and I find the final results of Full FT are also quite different but the results of Lora are very close.
Hi, thank you very much for your great repo again!
I would like to use this codebase to conduct the experiments about full-fine tuning. However, I find the results is not very stable, when I try to change the
--micro_batch_size, the final results are quite different. In addition, the results on LoRA and MoRA is very stable, even if I change--micro_batch_size. This is my current running command about full fine-tuning:I also try to run the fine-tuning code on different type of GPUs, such as A100 and H100 and I find the final results of Full FT are also quite different but the results of Lora are very close.