You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
result = forward_call(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 268, in forward
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.75 GiB (GPU 4; 21.99 GiB total capacity; 16.79 GiB already allocated; 907.38 MiB free; 20.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered:
I'm curious why nobody answered this issue, but I ran into a similar issue recently and the problem was with the template -- even though the config claimed it was using LoRA, it was actually missing the lora part. The template is now updated and looks like
# Change this to the model you want to fine-tune
model_id: meta-llama/Meta-Llama-3-70B-Instruct
# Change this to the path to your training data
train_path: s3://air-example-data/gsm8k/train.jsonl
# Change this to the path to your validation data. This is optional
valid_path: s3://air-example-data/gsm8k/test.jsonl
# Change this to the context length you want to use. Examples with longer
# context length will be truncated.
context_length: 4096
# Change this to total number of GPUs that you want to use
num_devices: 8
# Change this to the number of epochs that you want to train for
num_epochs: 3
# Change this to the batch size that you want to use
train_batch_size_per_device: 2
eval_batch_size_per_device: 2
# Change this to the learning rate that you want to use
learning_rate: 1e-4
# This will pad batches to the longest sequence. Use "max_length" when profiling to profile the worst case.
padding: "longest"
# By default, we will keep the best checkpoint. You can change this to keep more checkpoints.
num_checkpoints_to_keep: 1
# Deepspeed configuration, you can provide your own deepspeed setup
deepspeed:
config_path: deepspeed_configs/zero_3_offload_optim+param.json
# Accelerator type, the value of 0.001 is not important, as long as it is
# between 0 and 1. This ensures that the given accelerator is available for each trainer
# worker.
worker_resources:
accelerator_type:A100-80G: 0.001
# Lora configuration
lora_config:
r: 8
lora_alpha: 16
lora_dropout: 0.05
target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- embed_tokens
- lm_head
task_type: "CAUSAL_LM"
bias: "none"
modules_to_save: []
note the lora part at the bottom and that config has been working for me :)
Launched finetuning job as follows and it failed with OOM Error for Llama-2-70B
ray_job_log_job_eqeqt513ex4xy1sgwgcjk8ag1i.log
$ python main.py job_compute_configs/aws.yaml training_configs/lora/llama-2-70b-4k-4xg5_48xlarge.yaml
Error
The text was updated successfully, but these errors were encountered: