About Full Fine-Tuning

Hi, thank you very much for your great repo again!

I would like to use this codebase to conduct the experiments about full-fine tuning. However, I find the results is not very stable, when I try to change the `--micro_batch_size`, the final results are quite different. In addition, the results on LoRA and MoRA is very stable, even if I change `--micro_batch_size`. This is my current running command about full fine-tuning: 

```
deepspeed --master_port 29920 --include="localhost:0,1,2,3" train.py \
           --base_model 'meta-llama/Llama-2-7b-hf' --micro_batch_size 4 \
            --wandb_run_name lora_gsm8k_ft_epoch_3_2e5_s11 --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,down_proj,up_proj \
            --num_epochs 3 --deepspeed ds_11.config --wandb_project lora-math --lora_r $RANK --batch_size 128 \
            --data_path meta-math/MetaMath \
            --seed 11 \
            --save_steps 3000 \
            --learning_rate 2e-5 \
            --logging_steps 5 \
            --use_bf16  --use_16bit --full_ft
```

I also try to run the fine-tuning code on different type of GPUs, such as A100 and H100 and I find the final results of Full FT are also quite different but the results of Lora are very close. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Full Fine-Tuning #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About Full Fine-Tuning #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions