Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loara train error - sh scripts/lora/lora.sh #55

Open
AdjugateMatrix opened this issue Aug 13, 2023 · 4 comments
Open

loara train error - sh scripts/lora/lora.sh #55

AdjugateMatrix opened this issue Aug 13, 2023 · 4 comments

Comments

@AdjugateMatrix
Copy link

AdjugateMatrix commented Aug 13, 2023

after sh scripts/lora/lora.sh.
it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished
@wangzaistone
Copy link
Member

after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished

sure , I remember the cuda version seems 11.6 ,and torch version is 2.0+ , tomorrow ,I will go to the lab machine to confirm further.

@wangzaistone
Copy link
Member

after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished

torch and cuda version : 2.0.1+cu117 (11.7) , it works for me .

@xiabo0816
Copy link

May I ask, at least what gpu can run torch==2.1.0? I'm working on 2080ti and torch==2.1.0

@qidanrui
Copy link
Collaborator

May I ask, at least what gpu can run torch==2.1.0? I'm working on 2080ti and torch==2.1.0

At least T4, V100, A100, 4090 are ok for 2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants