Skip to content

HIP mixed TurboQuant vec FA on gfx900/gfx906#99

Open
2bigO wants to merge 2 commits intoTheTom:feature/turboquant-kv-cachefrom
2bigO:feature/turboquant-kv-cache
Open

HIP mixed TurboQuant vec FA on gfx900/gfx906#99
2bigO wants to merge 2 commits intoTheTom:feature/turboquant-kv-cachefrom
2bigO:feature/turboquant-kv-cache

Conversation

@2bigO
Copy link
Copy Markdown

@2bigO 2bigO commented Apr 21, 2026

Overview

This PR adds a HIP-only guard to disable mixed TurboQuant vec FlashAttention by default on gfx900 / gfx906, while keeping same-type turbo vec FA enabled.

On this class of AMD GPUs, mixed KV setups such as q8_0 + turbo4 were building but failing at runtime because the HIP vec FA path was selected even though the mixed turbo vec specializations were not reliable on these targets. The result was a warmup-time abort in fattn.cu.

This change:

  • adds a new CMake option: GGML_HIP_DISABLE_MIXED_TURBO_VEC_FA
  • enables it by default only for gfx900 and gfx906
  • skips compiling mixed turbo vec instances on HIP when the option is enabled
  • guards vec FA declarations/dispatch accordingly
  • prevents runtime kernel selection from choosing the mixed turbo vec path when disabled

This keeps newer HIP targets unchanged by default, and makes q8_0 / turbo4 usable on gfx906-class hardware.

Additional information

I hit this on a ROCm/gfx906 deployment while bringing up llama-server with TurboQuant KV compression. The issue was not a build-only problem: after fixing link-time vec symbol mismatches, warmup still crashed because runtime kernel selection could still route mixed turbo KV into the vec path.

This patch is intentionally narrow:

  • HIP only
  • defaulted only for gfx900 / gfx906
  • same-type turbo vec FA still works
  • newer HIP targets can keep mixed turbo vec FA unless explicitly disabled

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES — AI was used to help inspect the failing HIP/TurboQuant dispatch path, compare candidate fixes, and draft the patch. I manually reviewed, tested, and validated the final change on real gfx906 hardware before preparing this PR.

PR 2 — stop installing tests by default

Overview

This PR changes LLAMA_TESTS_INSTALL to default to OFF.

Tests can still be built and installed explicitly, but installing them by default makes partial/package builds more fragile than necessary. In my case, cmake --install failed because install rules expected test binaries that were not built as part of the requested target set.

Turning test installation off by default keeps normal runtime/library installs cleaner and avoids unexpected install failures in downstream packaging/container builds.

Additional information

This does not remove test installation support. It only changes the default so that projects embedding or packaging llama.cpp do not need to opt out of installing test binaries they are not using.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES — AI was used to help inspect the install failure path and draft the minimal change. I manually reviewed the change and verified the behavior in a real build/container workflow before preparing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant