Skip to content

Add Gemma 4 MTP scaffolding#643

Open
ncylich wants to merge 2 commits into
v2from
mtp-scaffolding-split
Open

Add Gemma 4 MTP scaffolding#643
ncylich wants to merge 2 commits into
v2from
mtp-scaffolding-split

Conversation

@ncylich
Copy link
Copy Markdown
Collaborator

@ncylich ncylich commented May 14, 2026

Summary

  • add Gemma 4 MTP assistant loading, speculative draft/verify flow, sampling support, and metrics
  • add KV cache transaction support and focused MTP/unit coverage
  • keep MTP disabled by default with explicit draft-count opt-in

Validation

  • source ./venv/bin/activate && cactus build && ./cactus-engine/test.sh
  • CACTUS_TEST_GEMMA4_TARGET=/Users/noahcylich/Documents/Desert/cactus-v2/weights/gemma-4-e2b-it CACTUS_TEST_GEMMA4_ASSISTANT=/Users/noahcylich/Documents/Desert/cactus-v2/weights/gemma-4-e2b-it/assistant ./tests/mtp/run_fast_gemma4_mtp.sh --max-tokens 24 --timeout-seconds 240
  • ./tests/bench_e2e_decode.sh --model gemma-4-e2b-it --prompt "Count from 1 to 100, separated by commas." --max-tokens 80 --reps 1 --warmup 0 --mtp-max-drafts 1,2,3 --output /private/tmp/mtp_1_2_3_pr_verify.csv

MTP sweep

  • draft 1: 13.99 decode tok/s, 100.0% accept, no fallback
  • draft 2: 19.48 decode tok/s, 98.1% accept, no fallback
  • draft 3: 23.21 decode tok/s, 96.7% accept, no fallback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant