Skip to content

[ROCm] Scale operation fails with "invalid device function" during Gemma 4 loading #86

@http403

Description

@http403

Summary

TurboQuant crashes during model loading when using Gemma 4 31B. The error occurs with "invalid device function" in the SCALE operation.

System Info

  • CPU: AMD Ryzen 5 7600X
  • GPU: AMD Radeon RX 7900 XTX
  • RAM: 64GB
  • Model: gemma-4-31B-it-Q4_K_M.gguf

Versions

OS: Fedora 42
ROCm: 6.3.1
TruboQuant: 9e3fb40e8bc0f873ad4d3d8329b17dacff28e4ca

Commands

Build

cmake -B build_local \
    -DGGML_HIP=ON \
    -DCMAKE_BUILD_TYPE=Debug \
    -DGGML_HIP_ROCWMMA_FATTN=OFF \
    -DCMAKE_INSTALL_RPATH='$ORIGIN' \
    -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON \
    -DCMAKE_SKIP_BUILD_RPATH=OFF \
    -DGPU_TARGETS=gfx1100
cmake --build build_local --config Debug -j

Run

llama-cpp-turboquant/build_local/bin/llama-server \
    --model /workspace/models/gemma-4-31B-it-GGUF/gemma-4-31B-it-Q4_K_M.gguf \
    --mmproj /workspace/models/gemma-4-31B-it-GGUF/mmproj-gemma-4-31B-it-BF16.gguf \
    -ctk q8_0 \
    -ctv turbo3 \
    --ctx-size 65535 \
    -ngl 99 \
    --parallel 4

Behaviour

Expected
It loads and start serving.

Actual
Crashes

Log

Click to open
ggml_cuda_init: found 2 ROCm devices (Total VRAM: 56269 MiB):
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32, VRAM: 24560 MiB
  Device 1: AMD Ryzen 5 7600X 6-Core Processor, gfx1036 (0x1036), VMM: no, Wave Size: 32, VRAM: 31709 MiB
load_backend: failed to find ggml_backend_init in /workspace/llama-cpp-turboquant/build_local/bin/libggml-hip.so
load_backend: failed to find ggml_backend_init in /workspace/llama-cpp-turboquant/build_local/bin/libggml-cpu.so
build_info: b8983-9e3fb40e8
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
Running without SSL
init: using 11 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model '/workspace/models/lmstudio-community/gemma-4-31B-it-GGUF/gemma-4-31B-it-Q4_K_M.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected memory use with initial parameters [MiB]:
llama_params_fit_impl:   - ROCm0 (AMD Radeon RX 7900 XTX)            :  24560 total,   8501 used,  15846 free vs. target of   1024
llama_params_fit_impl:   - ROCm1 (AMD Ryzen 5 7600X 6-Core Processor):  31709 total,  13963 used,  25389 free vs. target of   1024
llama_params_fit_impl: projected to use 22465 MiB of device memory vs. 63701 MiB of free device memory
llama_params_fit_impl: targets for free memory can be met on all devices, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 1.86 seconds
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 7900 XTX) (0000:03:00.0) - 24348 MiB free
llama_model_load_from_file_impl: using device ROCm1 (AMD Ryzen 5 7600X 6-Core Processor) (0000:0f:00.0) - 39353 MiB free
llama_model_loader: loaded meta data with 43 key-value pairs and 833 tensors from /workspace/models/lmstudio-community/gemma-4-31B-it-GGUF/gemma-4-31B-it-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma4
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 64
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Gemma 4 31B
llama_model_loader: - kv   6:                           general.finetune str              = it
llama_model_loader: - kv   7:                         general.size_label str              = 31B
llama_model_loader: - kv   8:                         gemma4.block_count u32              = 60
llama_model_loader: - kv   9:                      gemma4.context_length u32              = 262144
llama_model_loader: - kv  10:                    gemma4.embedding_length u32              = 5376
llama_model_loader: - kv  11:                 gemma4.feed_forward_length u32              = 21504
llama_model_loader: - kv  12:                gemma4.attention.head_count u32              = 32
llama_model_loader: - kv  13:             gemma4.attention.head_count_kv arr[i32,60]      = [16, 16, 16, 16, 16, 4, 16, 16, 16, 1...
llama_model_loader: - kv  14:                      gemma4.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  15:                  gemma4.rope.freq_base_swa f32              = 10000.000000
llama_model_loader: - kv  16:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  17:                gemma4.attention.key_length u32              = 512
llama_model_loader: - kv  18:              gemma4.attention.value_length u32              = 512
llama_model_loader: - kv  19:             gemma4.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  20:            gemma4.attention.sliding_window u32              = 1024
llama_model_loader: - kv  21:          gemma4.attention.shared_kv_layers u32              = 0
llama_model_loader: - kv  22:    gemma4.embedding_length_per_layer_input u32              = 0
llama_model_loader: - kv  23:    gemma4.attention.sliding_window_pattern arr[bool,60]     = [true, true, true, true, true, false,...
llama_model_loader: - kv  24:            gemma4.attention.key_length_swa u32              = 256
llama_model_loader: - kv  25:          gemma4.attention.value_length_swa u32              = 256
llama_model_loader: - kv  26:                gemma4.rope.dimension_count u32              = 512
llama_model_loader: - kv  27:            gemma4.rope.dimension_count_swa u32              = 256
llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gemma4
llama_model_loader: - kv  29:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  30:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  37:               tokenizer.ggml.mask_token_id u32              = 4
llama_model_loader: - kv  38:                    tokenizer.chat_template str              = {%- macro format_parameters(propertie...
llama_model_loader: - kv  39:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  40:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  41:               general.quantization_version u32              = 2
llama_model_loader: - kv  42:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  422 tensors
llama_model_loader: - type q4_K:  355 tensors
llama_model_loader: - type q6_K:   56 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 17.39 GiB (4.87 BPW) 
load: 0 unused tokens
load: control-looking token:    212 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
load: control-looking token:     50 '<|tool_response>' was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 1 ('<eos>')
load:   - 50 ('<|tool_response>')
load:   - 106 ('<turn|>')
load:   - 212 ('</s>')
load: special_eog_ids contains '<|tool_response>', removing '</s>' token from EOG list
load: special tokens cache size = 24
load: token to piece cache size = 1.9445 MB
print_info: arch                  = gemma4
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 262144
print_info: n_embd                = 5376
print_info: n_embd_inp            = 5376
print_info: n_layer               = 60
print_info: n_head                = 32
print_info: n_head_kv             = [16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4, 16, 16, 16, 16, 16, 4]
print_info: n_rot                 = 512
print_info: n_swa                 = 1024
print_info: is_swa_any            = 1
print_info: n_embd_head_k         = 512
print_info: n_embd_head_v         = 512
print_info: n_gqa                 = [2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8]
print_info: n_embd_k_gqa          = [4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048]
print_info: n_embd_v_gqa          = [4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048, 4096, 4096, 4096, 4096, 4096, 2048]
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 1.0e+00
print_info: n_ff                  = 21504
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = -1
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: freq_base_swa         = 10000.0
print_info: freq_scale_swa        = 1
print_info: n_embd_head_k_swa     = 256
print_info: n_embd_head_v_swa     = 256
print_info: n_rot_swa             = 256
print_info: n_ctx_orig_yarn       = 262144
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = ?B
print_info: model params          = 30.70 B
print_info: general.name          = Gemma 4 31B
print_info: vocab type            = BPE
print_info: n_vocab               = 262144
print_info: n_merges              = 514906
print_info: BOS token             = 2 '<bos>'
print_info: EOS token             = 1 '<eos>'
print_info: UNK token             = 3 '<unk>'
print_info: PAD token             = 0 '<pad>'
print_info: MASK token            = 4 '<mask>'
print_info: LF token              = 107 '
'
print_info: EOG token             = 1 '<eos>'
print_info: EOG token             = 50 '<|tool_response>'
print_info: EOG token             = 106 '<turn|>'
print_info: max token length      = 93
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 59 repeating layers to GPU
load_tensors: offloaded 61/61 layers to GPU
load_tensors:   CPU_Mapped model buffer size =  1102.50 MiB
load_tensors:        ROCm0 model buffer size =  6681.53 MiB
load_tensors:        ROCm1 model buffer size = 11124.82 MiB
............................................................................................
common_init_result: added <eos> logit bias = -inf
common_init_result: added <|tool_response> logit bias = -inf
common_init_result: added <turn|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 65536
llama_context: n_ctx_seq     = 16384
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (16384) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context:  ROCm_Host  output buffer size =     4.00 MiB
llama_kv_cache_iswa: creating non-SWA KV cache, size = 16384 cells
llama_kv_cache:      ROCm0 KV buffer size =   744.00 MiB
llama_kv_cache:      ROCm1 KV buffer size =  1116.00 MiB
llama_kv_cache: size = 1860.00 MiB ( 16384 cells,  10 layers,  4/4 seqs), K (q8_0): 1360.00 MiB, V (turbo3):  500.00 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 512
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 512
llama_kv_cache_iswa: creating     SWA KV cache, size = 1536 cells
llama_kv_cache:      ROCm0 KV buffer size =   697.50 MiB
llama_kv_cache:      ROCm1 KV buffer size =  1046.25 MiB
llama_kv_cache: size = 1743.75 MiB (  1536 cells,  50 layers,  4/4 seqs), K (q8_0): 1275.00 MiB, V (turbo3):  468.75 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 256
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 256
llama_context: pipeline parallelism enabled
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      ROCm0 compute buffer size =   378.57 MiB
sched_reserve:      ROCm1 compute buffer size =   634.58 MiB
sched_reserve:  ROCm_Host compute buffer size =   161.09 MiB
sched_reserve: graph nodes  = 2642
sched_reserve: graph splits = 3
sched_reserve: reserve took 305.87 ms, sched copies = 4
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:99: ROCm error
ggml_cuda_compute_forward: SCALE failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:2997
  err
[New LWP 262859]
[New LWP 262858]
[New LWP 262857]
[New LWP 262856]
[New LWP 262855]
[New LWP 262854]
[New LWP 262853]
[New LWP 262852]
[New LWP 262851]
[New LWP 262850]
[New LWP 262849]
[New LWP 262848]
[New LWP 262847]
[New LWP 262846]
[New LWP 262840]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fb064e89422 in __syscall_cancel_arch () from /lib64/libc.so.6
#0  0x00007fb064e89422 in __syscall_cancel_arch () from /lib64/libc.so.6
#1  0x00007fb064e7d71c in __internal_syscall_cancel () from /lib64/libc.so.6
#2  0x00007fb064e7d764 in __syscall_cancel () from /lib64/libc.so.6
#3  0x00007fb064eedc0f in wait4 () from /lib64/libc.so.6
#4  0x00007fb06be7e37d in ggml_print_backtrace () at /workspace/llama-cpp-turboquant/ggml/src/ggml.c:219
219             waitpid(child_pid, NULL, 0);
#5  0x00007fb06be7e506 in ggml_abort (file=0x7fb0654d168a "/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu", line=99, fmt=0x7fb0654ff9b2 "ROCm error") at /workspace/llama-cpp-turboquant/ggml/src/ggml.c:253
253             ggml_print_backtrace();
#6  0x00007fb0656a85a2 in ggml_cuda_error (stmt=0x7fb0654c7f28 "err", func=func@entry=0x7fb06547213c "ggml_cuda_compute_forward", file=0x7fb0654d168a "/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu", line=line@entry=2997, msg=0x7fb05bcd3858 "invalid device function") at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:99
99          GGML_ABORT(GGML_CUDA_NAME " error");
#7  0x00007fb0656b18bc in ggml_cuda_compute_forward (ctx=..., dst=<optimized out>) at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:2997
2997            CUDA_CHECK(err);
#8  ggml_cuda_graph_evaluate_and_capture (cuda_ctx=0x4ccf25a0, cgraph=cgraph@entry=0x55a845e8, use_cuda_graph=false, cuda_graph_update_required=false, graph_key=<optimized out>) at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:4149
4149                    bool ok = ggml_cuda_compute_forward(*cuda_ctx, node);
#9  0x00007fb0656addb1 in ggml_backend_cuda_graph_compute (backend=<optimized out>, cgraph=0x55a845e8) at /workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/ggml-cuda.cu:4269
4269        ggml_cuda_graph_evaluate_and_capture(cuda_ctx, cgraph, use_cuda_graph, cuda_graph_update_required, graph_key);
#10 0x00007fb06be9a3e8 in ggml_backend_graph_compute_async (backend=0x529df300, cgraph=0x55a845e8) at /workspace/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:452
452         return backend->iface.graph_compute(backend, cgraph);
#11 0x00007fb06be9ef86 in ggml_backend_sched_compute_splits (sched=0x4ca17f70) at /workspace/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:1671
1671                enum ggml_status ec = ggml_backend_graph_compute_async(split_backend, &split->graph);
#12 0x00007fb06be9fdc7 in ggml_backend_sched_graph_compute_async (sched=0x4ca17f70, graph=0x4e9a5de0) at /workspace/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:1894
1894        return ggml_backend_sched_compute_splits(sched);
#13 0x00007fb06b69cdd2 in llama_context::graph_compute (this=0x5661a8a0, gf=0x4e9a5de0, batched=true) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:2191
2191        auto status = ggml_backend_sched_graph_compute_async(sched.get(), gf);
#14 0x00007fb06b6984c9 in llama_context::process_ubatch (this=0x5661a8a0, ubatch=..., gtype=LLM_GRAPH_TYPE_DECODER, mctx=0x3f420110, ret=@0x7ffc9dd46a2c: GGML_STATUS_SUCCESS) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:1231
1231        const auto status = graph_compute(res->get_gf(), ubatch.n_tokens > 1);
#15 0x00007fb06b69a1d7 in llama_context::decode (this=0x5661a8a0, batch_inp=...) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:1692
1692            const auto * res = process_ubatch(ubatch, LLM_GRAPH_TYPE_DECODER, mctx.get(), status);
#16 0x00007fb06b6a11f0 in llama_decode (ctx=0x5661a8a0, batch=...) at /workspace/llama-cpp-turboquant/src/llama-context.cpp:3479
3479        const int ret = ctx->decode(batch);
#17 0x0000000000688507 in common_init_from_params (params=...) at /workspace/llama-cpp-turboquant/common/common.cpp:1374
1374                llama_decode(lctx, llama_batch_get_one(tmp.data(), std::min(tmp.size(), (size_t) params.n_batch)));
#18 0x00000000004f9ebc in server_context_impl::load_model (this=0x4d778ac0, params=...) at /workspace/llama-cpp-turboquant/tools/server/server-context.cpp:749
749             llama_init = common_init_from_params(params_base);
#19 0x00000000004d3af8 in server_context::load_model (this=0x7ffc9dd4d878, params=...) at /workspace/llama-cpp-turboquant/tools/server/server-context.cpp:3067
3067        return impl->load_model(params);
#20 0x0000000000408c1e in main (argc=15, argv=0x7ffc9dd50648) at /workspace/llama-cpp-turboquant/tools/server/server.cpp:282
282             if (!ctx_server.load_model(params)) {
[Inferior 1 (process 262839) detached]
Aborted (core dumped)

Additional information

Upstream b8902 ROCm build without turboquant load successfully. I have also tried domvox's version but the failure is similar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions