HIP mixed TurboQuant vec FA on gfx900/gfx906 by 2bigO · Pull Request #99 · TheTom/llama-cpp-turboquant

2bigO · 2026-04-21T19:34:41Z

Overview

This PR adds a HIP-only guard to disable mixed TurboQuant vec FlashAttention by default on gfx900 / gfx906, while keeping same-type turbo vec FA enabled.

On this class of AMD GPUs, mixed KV setups such as q8_0 + turbo4 were building but failing at runtime because the HIP vec FA path was selected even though the mixed turbo vec specializations were not reliable on these targets. The result was a warmup-time abort in fattn.cu.

This change:

adds a new CMake option: GGML_HIP_DISABLE_MIXED_TURBO_VEC_FA
enables it by default only for gfx900 and gfx906
skips compiling mixed turbo vec instances on HIP when the option is enabled
guards vec FA declarations/dispatch accordingly
prevents runtime kernel selection from choosing the mixed turbo vec path when disabled

This keeps newer HIP targets unchanged by default, and makes q8_0 / turbo4 usable on gfx906-class hardware.

Additional information

I hit this on a ROCm/gfx906 deployment while bringing up llama-server with TurboQuant KV compression. The issue was not a build-only problem: after fixing link-time vec symbol mismatches, warmup still crashed because runtime kernel selection could still route mixed turbo KV into the vec path.

This patch is intentionally narrow:

HIP only
defaulted only for gfx900 / gfx906
same-type turbo vec FA still works
newer HIP targets can keep mixed turbo vec FA unless explicitly disabled

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES — AI was used to help inspect the failing HIP/TurboQuant dispatch path, compare candidate fixes, and draft the patch. I manually reviewed, tested, and validated the final change on real gfx906 hardware before preparing this PR.

PR 2 — stop installing tests by default

Overview

This PR changes LLAMA_TESTS_INSTALL to default to OFF.

Tests can still be built and installed explicitly, but installing them by default makes partial/package builds more fragile than necessary. In my case, cmake --install failed because install rules expected test binaries that were not built as part of the requested target set.

Turning test installation off by default keeps normal runtime/library installs cleaner and avoids unexpected install failures in downstream packaging/container builds.

Additional information

This does not remove test installation support. It only changes the default so that projects embedding or packaging llama.cpp do not need to opt out of installing test binaries they are not using.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES — AI was used to help inspect the install failure path and draft the minimal change. I manually reviewed the change and verified the behavior in a real build/container workflow before preparing this PR.

2bigO added 2 commits April 21, 2026 15:28

hip: disable mixed turbo vec FA by default on gfx900/gfx906

f15ee99

cmake: stop installing tests by default

bc2d9e5

github-actions Bot added Nvidia GPU ggml build labels Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIP mixed TurboQuant vec FA on gfx900/gfx906#99

HIP mixed TurboQuant vec FA on gfx900/gfx906#99
2bigO wants to merge 2 commits intoTheTom:feature/turboquant-kv-cachefrom
2bigO:feature/turboquant-kv-cache

2bigO commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

2bigO commented Apr 21, 2026

Overview

Additional information

Requirements

PR 2 — stop installing tests by default

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant