[ROCm] Scale operation fails with "invalid device function" during Gemma 4 loading

## Summary

TurboQuant crashes during model loading when using Gemma 4 31B. The error occurs with "invalid device function" in the SCALE operation.

## System Info

- CPU: AMD Ryzen 5 7600X
- GPU: AMD Radeon RX 7900 XTX
- RAM: 64GB
- Model: gemma-4-31B-it-Q4_K_M.gguf

## Versions
**OS**: Fedora 42
**ROCm**: 6.3.1
**TruboQuant**: 9e3fb40e8bc0f873ad4d3d8329b17dacff28e4ca

## Commands

**Build**
```bash
cmake -B build_local \
    -DGGML_HIP=ON \
    -DCMAKE_BUILD_TYPE=Debug \
    -DGGML_HIP_ROCWMMA_FATTN=OFF \
    -DCMAKE_INSTALL_RPATH='$ORIGIN' \
    -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON \
    -DCMAKE_SKIP_BUILD_RPATH=OFF \
    -DGPU_TARGETS=gfx1100
cmake --build build_local --config Debug -j
```

**Run**
```
llama-cpp-turboquant/build_local/bin/llama-server \
    --model /workspace/models/gemma-4-31B-it-GGUF/gemma-4-31B-it-Q4_K_M.gguf \
    --mmproj /workspace/models/gemma-4-31B-it-GGUF/mmproj-gemma-4-31B-it-BF16.gguf \
    -ctk q8_0 \
    -ctv turbo3 \
    --ctx-size 65535 \
    -ngl 99 \
    --parallel 4
```

## Behaviour

**Expected**
It loads and start serving.

**Actual**
Crashes

## Log
<details><summary>Click to open</summary>

```
ggml_cuda_init: found 2 ROCm devices (Total VRAM: 56269 MiB):
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, VRAM: 24560 MiB
  Device 1: AMD Ryzen 5 7600X 6-Core Processor, gfx1036 (0x1036), VMM: no, Wave Size: 32, VRAM: 31709 MiB
load_backend: failed to find ggml_backend_init in /workspace/llama-cpp-turboquant/build_local/bin/libggml-hip.so
load_backend: failed to find ggml_backend_init in /workspace/llama-cpp-turboquant/build_local/bin/libggml-cpu.so
build_info: b8983-9e3fb40e8
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
Running without SSL
init: using 11 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model '/workspace/models/lmstudio-community/gemma-4-31B-it-GGUF/gemma-4-31B-it-Q4_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected memory use with initial parameters [MiB]:
llama_params_fit_impl:   - ROCm0 (AMD Radeon RX 7900 XTX)            :  24560 total,   8501 used,  15846 free vs. target of   1024
llama_params_fit_impl:   - ROCm1 (AMD Ryzen 5 7600X 6-Core Processor):  31709 total,  13963 used,  25389 free vs. target of   1024
llama_params_fit_impl: projected to use 22465 MiB of device memory vs. 63701 MiB of free device memory
llama_params_fit_impl: targets for free memory can be met on all devices, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 1.86 seconds
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 7900 XTX) (0000:03:00.0) - 24348 MiB free
llama_model_load_from_file_impl: using device ROCm1 (AMD Ryzen 5 7600X 6-Core Processor) (0000:0f:00.0) - 39353 MiB free
llama_model_loader: loaded meta data with 43 key-value pairs and 833 tensors from /workspace/models/lmstudio-community/gemma-4-31B-it-GGUF/gemma-4-31B-it-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma4
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 64
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Gemma 4 31B
llama_model_loader: - kv   6:                           general.finetune str              = it
llama_model_loader: - kv   7:                         general.size_label str              = 31B
llama_model_loader: - kv   8:                         gemma4.block_count u32              = 60
llama_model_loader: - kv   9:                      gemma4.context_length u32              = 262144
llama_model_loader: - kv  10:                    gemma4.embedding_length u32              = 5376
llama_model_loader: - kv  11:                 gemma4.feed_forward_length u32              = 21504
llama_model_loader: - kv  12:                gemma4.attention.head_count u32              = 32
llama_model_loader: - kv  13:             gemma4.attention.head_count_kv arr[i32,60]      = [16, 16, 16, 16, 16, 4, 16, 16, 16, 1...
llama_model_loader: - kv  14:                      gemma4.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  15:                  gemma4.rope.freq_base_swa f32              = 10000.000000
llama_model_loader: - kv  16:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  17:                gemma4.attention.key_length u32              = 512
llama_model_loader: - kv  18:              gemma4.attention.value_length u32              = 512
llama_model_loader: - kv  19:             gemma4.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  20:            gemma4.attention.sliding_window u32              = 1024
llama_model_loader: - kv  21:          gemma4.attention.shared_kv_layers u32              = 0
llama_model_loader: - kv  22:    gemma4.embedding_length_per_layer_input u32              = 0
llama_model_loader: - kv  23:    gemma4.attention.sliding_window_pattern arr[bool,60]     = [true, true, true, true, true, false,...
llama_model_loader: - kv  24:            gemma4.attention.key_length_swa u32              = 256
llama_model_loader: - kv  25:          gemma4.attention.value_length_swa u32              = 256
llama_model_loader: - kv  26:                gemma4.rope.dimension_count u32              = 512
llama_model_loader: - kv  27:            gemma4.rope.dimension_count_swa u32              = 256
llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gemma4
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  30:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  37:               tokenizer.ggml.mask_token_id u32              = 4
llama_model_loader: - kv  38:                    tokenizer.chat_template str              = {%- macro format_parameters(propertie...
llama_model_loader: - kv  39:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  40:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  41:               general.quantization_version u32              = 2
llama_model_loader: - kv  42:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  422 tensors
llama_model_loader: - type q4_K:  355 tensors
llama_model_loader: - type q6_K:   56 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 17.39 GiB (4.87 BPW) 
load: 0 unused tokens
load: control-looking token:    212 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
load: control-looking token:     50 '<|tool_response>' was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 1 ('<eos>')
load:   - 50 ('<|tool_response>')
load:   - 106 ('<turn|>')
load:   - 212 ('</s>')
load: special_eog_ids contains '<|tool_response>', removing '</s>' token from EOG list
load: special tokens cache size = 24
load: token to piece cache size = 1.9445 MB
print_info: arch                  = gemma4
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 262144
print_info: n_embd                = 5376
print_info: n_embd_inp            = 5376
print_info: n_layer               = 60
print_info: n_head                = 32
print_info: n_head_kv             = [16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4]
print_info: n_rot                 = 512
print_info: n_swa                 = 1024
print_info: is_swa_any            = 1
print_info: n_embd_head_k         = 512
print_info: n_embd_head_v         = 512
print_info: n_gqa                 = [2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8]
print_info: n_embd_k_gqa          = [4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048]
print_info: n_embd_v_gqa          = [4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048]
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 1.0e+00
print_info: n_ff                  = 21504
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = -1
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: freq_base_swa         = 10000.0
print_info: freq_scale_swa        = 1
print_info: n_embd_head_k_swa     = 256
print_info: n_embd_head_v_swa     = 256
print_info: n_rot_swa             = 256
print_info: n_ctx_orig_yarn       = 262144
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = ?B
print_info: model params          = 30.70 B
print_info: general.name          = Gemma 4 31B
print_info: vocab type            = BPE
print_info: n_vocab               = 262144
print_info: n_merges              = 514906
print_info: BOS token             = 2 '<bos>'
print_info: EOS token             = 1 '<eos>'
print_info: UNK token             = 3 '<unk>'
print_info: PAD token             = 0 '<pad>'
print_info: MASK token            = 4 '<mask>'
print_info: LF token              = 107 '
'
print_info: EOG token             = 1 '<eos>'
print_info: EOG token             = 50 '<|tool_response>'
print_info: EOG token             = 106 '<turn|>'
print_info: max token length      = 93
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 59 repeating layers to GPU
load_tensors: offloaded 61/61 layers to GPU
load_tensors:   CPU_Mapped model buffer size =  1102.50 MiB
load_tensors:        ROCm0 model buffer size =  6681.53 MiB
load_tensors:        ROCm1 model buffer size = 11124.82 MiB
............................................................................................
common_init_result: added <eos> logit bias = -inf
common_init_result: added <|tool_response> logit bias = -inf
common_init_result: added <turn|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 65536
llama_context: n_ctx_seq     = 16384
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (16384) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context:  ROCm_Host  output buffer size =     4.00 MiB
llama_kv_cache_iswa: creating non-SWA KV cache, size = 16384 cells
llama_kv_cache:      ROCm0 KV buffer size =   744.00 MiB
llama_kv_cache:      ROCm1 KV buffer size =  1116.00 MiB
llama_kv_cache: size = 1860.00 MiB ( 16384 cells,  10 layers,  4/4 seqs), K (q8_0): 1360.00 MiB, V (turbo3):  500.00 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 512
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 512
llama_kv_cache_iswa: creating     SWA KV cache, size = 1536 cells
llama_kv_cache:      ROCm0 KV buffer size =   697.50 MiB
llama_kv_cache:      ROCm1 KV buffer size =  1046.25 MiB
llama_kv_cache: size = 1743.75 MiB (  1536 cells,  50 layers,  4/4 seqs), K (q8_0): 1275.00 MiB, V (turbo3):  468.75 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 256
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 256
llama_context: pipeline parallelism enabled
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      ROCm0 compute buffer size =   378.57 MiB
sched_reserve:      ROCm1 compute buffer size =   634.58 MiB
sched_reserve:  ROCm_Host compute buffer size =   161.09 MiB
sched_reserve: graph nodes  = 2642
sched_reserve: graph splits = 3
sched_reserve: reserve took 305.87 ms, sched copies = 4
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:99: ROCm error
ggml_cuda_compute_forward: SCALE failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:2997
  err
[New LWP 262859]
[New LWP 262858]
[New LWP 262857]
[New LWP 262856]
[New LWP 262855]
[New LWP 262854]
[New LWP 262853]
[New LWP 262852]
[New LWP 262851]
[New LWP 262850]
[New LWP 262849]
[New LWP 262848]
[New LWP 262847]
[New LWP 262846]
[New LWP 262840]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fb064e89422 in __syscall_cancel_arch () from /lib64/libc.so.6
#0  0x00007fb064e89422 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007fb064e7d71c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007fb064e7d764 in __syscall_cancel () from /lib64/libc.so.6
#3  0x00007fb064eedc0f in wait4 () from /lib64/libc.so.6
#4  0x00007fb06be7e37d in ggml_print_backtrace () at /workspace/llama-cpp-turboquant/ggml/src/ggml.c:219
219             waitpid(child_pid, NULL, 0);
#5  0x00007fb06be7e506 in ggml_abort (file=0x7fb0654d168a "/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu", line=99, fmt=0x7fb0654ff9b2 "ROCm error") at /workspace/llama-cpp-turboquant/ggml/src/ggml.c:253
253             ggml_print_backtrace();
#6  0x00007fb0656a85a2 in ggml_cuda_error (stmt=0x7fb0654c7f28 "err", func=func@entry=0x7fb06547213c "ggml_cuda_compute_forward", file=0x7fb0654d168a "/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu", line=line@entry=2997, msg=0x7fb05bcd3858 "invalid device function") at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:99
99          GGML_ABORT(GGML_CUDA_NAME " error");
#7  0x00007fb0656b18bc in ggml_cuda_compute_forward (ctx=..., dst=<optimized out>) at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:2997
2997            CUDA_CHECK(err);
#8  ggml_cuda_graph_evaluate_and_capture (cuda_ctx=0x4ccf25a0, cgraph=cgraph@entry=0x55a845e8, use_cuda_graph=false, cuda_graph_update_required=false, graph_key=<optimized out>) at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:4149
4149                    bool ok = ggml_cuda_compute_forward(*cuda_ctx, node);
#9  0x00007fb0656addb1 in ggml_backend_cuda_graph_compute (backend=<optimized out>, cgraph=0x55a845e8) at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:4269
4269        ggml_cuda_graph_evaluate_and_capture(cuda_ctx, cgraph, use_cuda_graph, cuda_graph_update_required, graph_key);
#10 0x00007fb06be9a3e8 in ggml_backend_graph_compute_async (backend=0x529df300, cgraph=0x55a845e8) at /workspace/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:452
452         return backend->iface.graph_compute(backend, cgraph);
#11 0x00007fb06be9ef86 in ggml_backend_sched_compute_splits (sched=0x4ca17f70) at /workspace/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:1671
1671                enum ggml_status ec = ggml_backend_graph_compute_async(split_backend, &split->graph);
#12 0x00007fb06be9fdc7 in ggml_backend_sched_graph_compute_async (sched=0x4ca17f70, graph=0x4e9a5de0) at /workspace/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:1894
1894        return ggml_backend_sched_compute_splits(sched);
#13 0x00007fb06b69cdd2 in llama_context::graph_compute (this=0x5661a8a0, gf=0x4e9a5de0, batched=true) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:2191
2191        auto status = ggml_backend_sched_graph_compute_async(sched.get(), gf);
#14 0x00007fb06b6984c9 in llama_context::process_ubatch (this=0x5661a8a0, ubatch=..., gtype=LLM_GRAPH_TYPE_DECODER, mctx=0x3f420110, ret=@0x7ffc9dd46a2c: GGML_STATUS_SUCCESS) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:1231
1231        const auto status = graph_compute(res->get_gf(), ubatch.n_tokens > 1);
#15 0x00007fb06b69a1d7 in llama_context::decode (this=0x5661a8a0, batch_inp=...) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:1692
1692            const auto * res = process_ubatch(ubatch, LLM_GRAPH_TYPE_DECODER, mctx.get(), status);
#16 0x00007fb06b6a11f0 in llama_decode (ctx=0x5661a8a0, batch=...) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:3479
3479        const int ret = ctx->decode(batch);
#17 0x0000000000688507 in common_init_from_params (params=...) at /workspace/llama-cpp-turboquant/common/common.cpp:1374
1374                llama_decode(lctx, llama_batch_get_one(tmp.data(), std::min(tmp.size(), (size_t) params.n_batch)));
#18 0x00000000004f9ebc in server_context_impl::load_model (this=0x4d778ac0, params=...) at /workspace/llama-cpp-turboquant/tools/server/server-context.cpp:749
749             llama_init = common_init_from_params(params_base);
#19 0x00000000004d3af8 in server_context::load_model (this=0x7ffc9dd4d878, params=...) at /workspace/llama-cpp-turboquant/tools/server/server-context.cpp:3067
3067        return impl->load_model(params);
#20 0x0000000000408c1e in main (argc=15, argv=0x7ffc9dd50648) at /workspace/llama-cpp-turboquant/tools/server/server.cpp:282
282             if (!ctx_server.load_model(params)) {
[Inferior 1 (process 262839) detached]
Aborted (core dumped)
```
</details>

## Additional information
Upstream b8902 ROCm build without turboquant load successfully. I have also tried [domvox's version](https://github.com/domvox/llama.cpp-turboquant-hip) but the failure is similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Scale operation fails with "invalid device function" during Gemma 4 loading #86

Summary

System Info

Versions

Commands

Behaviour

Log

Additional information

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[ROCm] Scale operation fails with "invalid device function" during Gemma 4 loading #86

Description

Summary

System Info

Versions

Commands

Behaviour

Log

Additional information

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions