Skip to content

feat(tuning): VRAM-tier-adaptive sizing for the full Intel-Mac AMD GPU range#9

Merged
sunrunnerfire merged 1 commit into
mainfrom
feat/vram-tier-adaptive-sizing
May 31, 2026
Merged

feat(tuning): VRAM-tier-adaptive sizing for the full Intel-Mac AMD GPU range#9
sunrunnerfire merged 1 commit into
mainfrom
feat/vram-tier-adaptive-sizing

Conversation

@sunrunnerfire
Copy link
Copy Markdown
Contributor

What

Closes the gap that kept quenchforge's AMD support effectively Vega-II-only: every AMD-discrete profile inherited the 32 GB Vega II bench constants (--ctx-size 8192, embed --ubatch-size 1024) regardless of card, so smaller Intel-Mac AMD GPUs (8 GB RX 5700, 4 GB MacBook Pro dGPUs) could oversubscribe VRAM and forced operators to hand-tune QUENCHFORGE_MAX_CONTEXT / QUENCHFORGE_EMBED_UBATCH_SIZE.

How

internal/tuning/tuning.go::amdSizing(vramGB) derives the context ceiling + embed ubatch from the detected headline VRAM (hardware.Info.GPUVRAMGB, now threaded into KernelParams):

VRAM --ctx-size cap embed --ubatch-size example cards
≥ 12 GB none (keeps MaxContext) 1024 Vega II/Duo, W6800X, W6900X, Vega 56/64, 5600M
7–11 GB 4096 512 RX 5700 / 5700 XT, W5700X
≤ 6 GB 2048 256 4 GB MBP dGPUs (5300M/5500M), Polaris 560X

Guarantees

  • Zero regression on the validated path — ≥ 12 GB and any VRAM probe miss (0/unknown) keep the exact Vega-II-benched values, so the canonical Mac Pro config is unchanged.
  • Caps only lowerbuildSlotArgs applies min(cfg.MaxContext, cap); a high-VRAM operator who raised QUENCHFORGE_MAX_CONTEXT is never clamped.
  • Operator overrides still win — explicit QUENCHFORGE_EMBED_UBATCH_SIZE beats the tier; the context cap is independent.
  • Family-agnostic — unlisted/future AMD cards fall through classifyProfilevega-pro and size by VRAM like any other.

Tests

TestAmdSizing_Tiers, TestKernelParams_EmbedLowVRAMScalesDown, TestKernelParams_ContextCapAppliesToAllAMDSlots, TestKernelParams_HighVRAMAndNonAMDHaveNoContextCap, TestKernelParams_UbatchOverrideBeatsTierButCapStands, TestBuildSlotArgs_LowVRAMAMDCapsContextAndUbatch. Existing tuning/cmd suites pass unchanged (they exercise VRAM=0 → high tier). gofmt/vet/build clean.

Docs: CHANGELOG v0.8.0 (final), README env table, CLAUDE.md gotcha #2.

Every AMD-discrete profile previously inherited the Vega-II bench
constants (--ctx-size 8192, embed --ubatch-size 1024) regardless of the
card, so smaller Intel-Mac AMD GPUs (8 GB RX 5700, 4 GB MacBook Pro
dGPUs) could oversubscribe VRAM and forced operators to hand-set
QUENCHFORGE_MAX_CONTEXT / QUENCHFORGE_EMBED_UBATCH_SIZE.

amdSizing(vramGB) now derives both from the detected headline VRAM,
threaded into KernelParams as a new arg:

  >= 12 GB  -> no ctx cap, ubatch 1024   (Vega II/Duo, W6800X, W6900X, Vega 56/64)
  7-11 GB   -> ctx 4096,   ubatch 512    (RX 5700/5700 XT, W5700X)
  <= 6 GB   -> ctx 2048,   ubatch 256    (4 GB MBP dGPUs, Polaris 560X)

Guarantees: the >= 12 GB tier and any VRAM probe miss (0/unknown) keep
the exact Vega-II-validated values (zero regression on the canonical
path); buildSlotArgs applies the context cap as min(MaxContext, cap) so
it only ever lowers; an explicit QUENCHFORGE_EMBED_UBATCH_SIZE still
overrides the tier. Family-agnostic: unlisted/future AMD cards fall
through classifyProfile to vega-pro and size by VRAM like any other.

Adds SlotTuning.ContextSize, the amdSizing curve, six new tuning/cmd
tests, and updates CHANGELOG (v0.8.0 final), README env table, and
CLAUDE.md gotcha #2.
@sunrunnerfire sunrunnerfire merged commit 31f9ab9 into main May 31, 2026
6 checks passed
@sunrunnerfire sunrunnerfire deleted the feat/vram-tier-adaptive-sizing branch May 31, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant