Skip to content

Getting nvcc fatal : Unsupported gpu architecture 'compute_89' (ninja build stop on 4090) #54

@AfterHAL

Description

@AfterHAL

Error

nvcc fatal : Unsupported gpu architecture 'compute_89'
ninja: build stopped: subcommand failed.

System

WSL/Ubuntu on Win11
CUDA 13.0.2 installd
RTX 4090

I've been trying export TORCH_CUDA_ARCH_LIST=8.9 without success.

Console report

(venv) training_sets/DIVERS_QWEN$ CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes=1 /mnt/w/flymyai-lora-trainer/train_4090.py --config City_Aerial_QL01.yaml
ipex flag is deprecated, will be removed in Accelerate v1.10. From 2.7.0, PyTorch has all needed optimizations for Intel CPU and XPU.
11/04/2025 14:47:17 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.21s/it]
Loaded text_encoder as Qwen2_5_VLForConditionalGeneration from `text_encoder` subfolder of Qwen/Qwen-Image.███████████████████████| 4/4 [00:04<00:00,  1.17it/s]
Loading pipeline components...:  33%|███████████████████████████████                                                              | 1/3 [00:05<00:11,  5.81s/it]Loaded tokenizer as Qwen2Tokenizer from `tokenizer` subfolder of Qwen/Qwen-Image.
Loading pipeline components...:  67%|██████████████████████████████████████████████████████████████                               | 2/3 [00:06<00:02,  2.57s/it]Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of Qwen/Qwen-Image.
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00,  2.04s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 377/377 [00:23<00:00, 16.32it/s]
All model checkpoint weights were used when initializing AutoencoderKLQwenImage.

All the weights of AutoencoderKLQwenImage were initialized from the model checkpoint at Qwen/Qwen-Image.
If your task is similar to the task the model of the checkpoint was trained on, you can already use AutoencoderKLQwenImage for predictions without further training.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 377/377 [01:45<00:00,  3.58it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 5353.67it/s]
The config attributes {'pooled_projection_dim': 768} were passed to QwenImageTransformer2DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [06:32<00:00, 43.57s/it]
All model checkpoint weights were used when initializing QwenImageTransformer2DModel.

All the weights of QwenImageTransformer2DModel were initialized from the model checkpoint at Qwen/Qwen-Image.
If your task is similar to the task the model of the checkpoint was trained on, you can already use QwenImageTransformer2DModel for predictions without further training.
  0%|                                                                                                                                    | 0/60 [00:00<?, ?it/s]/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
  0%|                                                                                                                                    | 0/60 [00:21<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2506, in _run_ninja_build
    subprocess.run(
  File "/usr/local/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/w/flymyai-lora-trainer/train_4090.py", line 490, in <module>
    main()
  File "/mnt/w/flymyai-lora-trainer/train_4090.py", line 228, in main
    freeze(block)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/quantize.py", line 146, in freeze
    m.freeze()
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/nn/qmodule.py", line 303, in freeze
    qweight = self.qweight
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/nn/qmodule.py", line 269, in qweight
    return quantize_weight(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/quantization.py", line 70, in quantize_weight
    return WeightQBytesTensor.quantize(t, qtype, axis, scale, activation_qtype, optimized)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/qbytes.py", line 166, in quantize
    return WeightQBytesQuantizer.apply(base, qtype, axis, scale, activation_qtype, optimized)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/qbytes.py", line 43, in forward
    return WeightQBytesTensor.create(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/qbytes.py", line 141, in create
    return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/marlin/fp8/qbits.py", line 79, in __init__
    data_packed = MarlinF8PackedTensor.pack(data)  # pack fp8 data to in32, and apply marlier re-ordering.
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/tensor/weights/marlin/fp8/packed.py", line 183, in pack
    data_int32 = torch.ops.quanto.pack_fp8_marlin(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/_ops.py", line 1158, in __call__
    return self._op(*args, **(kwargs or {}))
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/__init__.py", line 167, in gptq_marlin_repack
    return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/extension.py", line 44, in lib
    self._lib = load(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in load
    return _jit_compile(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2076, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2222, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2522, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'quanto_cuda': [1/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemv_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemv_cuda.cu -o gemv_cuda.cuda.o
FAILED: [code=1] gemv_cuda.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemv_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemv_cuda.cu -o gemv_cuda.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[2/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output unpack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/unpack.cu -o unpack.cuda.o
FAILED: [code=1] unpack.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output unpack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/unpack.cu -o unpack.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[3/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output fp8_marlin.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/fp8_marlin.cu -o fp8_marlin.cuda.o
FAILED: [code=1] fp8_marlin.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output fp8_marlin.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/fp8_marlin.cu -o fp8_marlin.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[4/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gptq_marlin_repack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/gptq_marlin_repack.cu -o gptq_marlin_repack.cuda.o
FAILED: [code=1] gptq_marlin_repack.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gptq_marlin_repack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/gptq_marlin_repack.cu -o gptq_marlin_repack.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[5/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output marlin_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/marlin_cuda_kernel.cu -o marlin_cuda_kernel.cuda.o
FAILED: [code=1] marlin_cuda_kernel.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output marlin_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/marlin/marlin_cuda_kernel.cu -o marlin_cuda_kernel.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
[6/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: [code=1] gemm_cuda.cuda.o
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include -isystem /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -std=c++17 -c /mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu -o gemm_cuda.cuda.o
nvcc fatal   : Unsupported gpu architecture 'compute_89'
ninja: build stopped: subcommand failed.

Traceback (most recent call last):
  File "/mnt/w/flymyai-lora-trainer/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
    args.func(args)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1199, in launch_command
    simple_launcher(args)
  File "/mnt/w/flymyai-lora-trainer/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 785, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/mnt/w/flymyai-lora-trainer/venv/bin/python3', '/mnt/w/flymyai-lora-trainer/train_4090.py', '--config', 'City_Aerial_QL01.yaml']' returned non-zero exit status 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions