(venv) training_sets/DIVERS_QWEN$ CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes=1 /mnt/w/flymyai-lora-trainer/train_4090.py --config City_Aerial_QL01.yaml
ipex flag is deprecated, will be removed in Accelerate v1.10. From 2.7.0, PyTorch has all needed optimizations for Intel CPU and XPU.
11/04/2025 14:47:17 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: bf16
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.21s/it]
Loaded text_encoder as Qwen2_5_VLForConditionalGeneration from `text_encoder` subfolder of Qwen/Qwen-Image.███████████████████████| 4/4 [00:04<00:00, 1.17it/s]
Loading pipeline components...: 33%|███████████████████████████████ | 1/3 [00:05<00:11, 5.81s/it]Loaded tokenizer as Qwen2Tokenizer from `tokenizer` subfolder of Qwen/Qwen-Image.
Loading pipeline components...: 67%|██████████████████████████████████████████████████████████████ | 2/3 [00:06<00:02, 2.57s/it]Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of Qwen/Qwen-Image.
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.04s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 377/377 [00:23<00:00, 16.32it/s]
All model checkpoint weights were used when initializing AutoencoderKLQwenImage.
All the weights of AutoencoderKLQwenImage were initialized from the model checkpoint at Qwen/Qwen-Image.
If your task is similar to the task the model of the checkpoint was trained on, you can already use AutoencoderKLQwenImage for predictions without further training.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 377/377 [01:45<00:00, 3.58it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 5353.67it/s]
The config attributes {'pooled_projection_dim': 768} were passed to QwenImageTransformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [06:32<00:00, 43.57s/it]
All model checkpoint weights were used when initializing QwenImageTransformer2DModel.
All the weights of QwenImageTransformer2DModel were initialized from the model checkpoint at Qwen/Qwen-Image.
If your task is similar to the task the model of the checkpoint was trained on, you can already use QwenImageTransformer2DModel for predictions without further training.
0%| | 0/60 [00:00<?, ?it/s]/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
0%| | 0/60 [00:21<?, ?it/s]
Traceback (most recent call last):
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2506, in _run_ninja_build
subprocess.run(
File "/usr/local/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/w/flymyai-lora-trainer/train_4090.py", line 490, in <module>
main()
File "/mnt/w/flymyai-lora-trainer/train_4090.py", line 228, in main
freeze(block)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/quantize.py", line 146, in freeze
m.freeze()
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/nn/qmodule.py", line 303, in freeze
qweight = self.qweight
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/nn/qmodule.py", line 269, in qweight
return quantize_weight(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/quantization.py", line 70, in quantize_weight
return WeightQBytesTensor.quantize(t, qtype, axis, scale, activation_qtype, optimized)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/qbytes.py", line 166, in quantize
return WeightQBytesQuantizer.apply(base, qtype, axis, scale, activation_qtype, optimized)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/qbytes.py", line 43, in forward
return WeightQBytesTensor.create(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/qbytes.py", line 141, in create
return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/marlin/fp8/qbits.py", line 79, in __init__
data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering.
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/marlin/fp8/packed.py", line 183, in pack
data_int32 = torch.ops.quanto.pack_fp8_marlin(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/_ops.py", line 1158, in __call__
return self._op(*args, **(kwargs or {}))
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/__init__.py", line 167, in gptq_marlin_repack
return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/extension.py", line 44, in lib
self._lib = load(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in load
return _jit_compile(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2076, in _jit_compile
_write_ninja_file_and_build_library(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2222, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2522, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quanto_cuda': [1/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemv_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemv_cuda.cu -o gemv_cuda.cuda.o
FAILED: [code=1] gemv_cuda.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemv_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemv_cuda.cu -o gemv_cuda.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_89'
[2/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output unpack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/unpack.cu -o unpack.cuda.o
FAILED: [code=1] unpack.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output unpack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/unpack.cu -o unpack.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_89'
[3/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output fp8_marlin.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/fp8_marlin.cu -o fp8_marlin.cuda.o
FAILED: [code=1] fp8_marlin.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output fp8_marlin.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/fp8_marlin.cu -o fp8_marlin.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_89'
[4/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gptq_marlin_repack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/gptq_marlin_repack.cu -o gptq_marlin_repack.cuda.o
FAILED: [code=1] gptq_marlin_repack.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gptq_marlin_repack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/gptq_marlin_repack.cu -o gptq_marlin_repack.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_89'
[5/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output marlin_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/marlin_cuda_kernel.cu -o marlin_cuda_kernel.cuda.o
FAILED: [code=1] marlin_cuda_kernel.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output marlin_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/marlin_cuda_kernel.cu -o marlin_cuda_kernel.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_89'
[6/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: [code=1] gemm_cuda.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu -o gemm_cuda.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_89'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/mnt/w/flymyai-lora-trainer/venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
args.func(args)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1199, in launch_command
simple_launcher(args)
File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 785, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/mnt/w/flymyai-lora-trainer/venv/bin/python3', '/mnt/w/flymyai-lora-trainer/train_4090.py', '--config', 'City_Aerial_QL01.yaml']' returned non-zero exit status 1.
Error
nvcc fatal : Unsupported gpu architecture 'compute_89'
ninja: build stopped: subcommand failed.
System
WSL/Ubuntu on Win11
CUDA 13.0.2 installd
RTX 4090
I've been trying
export TORCH_CUDA_ARCH_LIST=8.9without success.Console report