feat(tuning): VRAM-tier-adaptive sizing for the full Intel-Mac AMD GPU range#9
Merged
Merged
Conversation
Every AMD-discrete profile previously inherited the Vega-II bench constants (--ctx-size 8192, embed --ubatch-size 1024) regardless of the card, so smaller Intel-Mac AMD GPUs (8 GB RX 5700, 4 GB MacBook Pro dGPUs) could oversubscribe VRAM and forced operators to hand-set QUENCHFORGE_MAX_CONTEXT / QUENCHFORGE_EMBED_UBATCH_SIZE. amdSizing(vramGB) now derives both from the detected headline VRAM, threaded into KernelParams as a new arg: >= 12 GB -> no ctx cap, ubatch 1024 (Vega II/Duo, W6800X, W6900X, Vega 56/64) 7-11 GB -> ctx 4096, ubatch 512 (RX 5700/5700 XT, W5700X) <= 6 GB -> ctx 2048, ubatch 256 (4 GB MBP dGPUs, Polaris 560X) Guarantees: the >= 12 GB tier and any VRAM probe miss (0/unknown) keep the exact Vega-II-validated values (zero regression on the canonical path); buildSlotArgs applies the context cap as min(MaxContext, cap) so it only ever lowers; an explicit QUENCHFORGE_EMBED_UBATCH_SIZE still overrides the tier. Family-agnostic: unlisted/future AMD cards fall through classifyProfile to vega-pro and size by VRAM like any other. Adds SlotTuning.ContextSize, the amdSizing curve, six new tuning/cmd tests, and updates CHANGELOG (v0.8.0 final), README env table, and CLAUDE.md gotcha #2.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes the gap that kept quenchforge's AMD support effectively Vega-II-only: every AMD-discrete profile inherited the 32 GB Vega II bench constants (
--ctx-size 8192, embed--ubatch-size 1024) regardless of card, so smaller Intel-Mac AMD GPUs (8 GB RX 5700, 4 GB MacBook Pro dGPUs) could oversubscribe VRAM and forced operators to hand-tuneQUENCHFORGE_MAX_CONTEXT/QUENCHFORGE_EMBED_UBATCH_SIZE.How
internal/tuning/tuning.go::amdSizing(vramGB)derives the context ceiling + embed ubatch from the detected headline VRAM (hardware.Info.GPUVRAMGB, now threaded intoKernelParams):--ctx-sizecap--ubatch-sizeMaxContext)Guarantees
buildSlotArgsappliesmin(cfg.MaxContext, cap); a high-VRAM operator who raisedQUENCHFORGE_MAX_CONTEXTis never clamped.QUENCHFORGE_EMBED_UBATCH_SIZEbeats the tier; the context cap is independent.classifyProfile→vega-proand size by VRAM like any other.Tests
TestAmdSizing_Tiers,TestKernelParams_EmbedLowVRAMScalesDown,TestKernelParams_ContextCapAppliesToAllAMDSlots,TestKernelParams_HighVRAMAndNonAMDHaveNoContextCap,TestKernelParams_UbatchOverrideBeatsTierButCapStands,TestBuildSlotArgs_LowVRAMAMDCapsContextAndUbatch. Existing tuning/cmd suites pass unchanged (they exercise VRAM=0 → high tier). gofmt/vet/build clean.Docs: CHANGELOG
v0.8.0(final), README env table, CLAUDE.md gotcha #2.