Skip to content

Add challenge 96: INT8 KV-Cache Attention (Medium)#250

Merged
kunal-mansukhani merged 1 commit intomainfrom
add-challenge-96-int8-kv-cache-attention
Apr 19, 2026
Merged

Add challenge 96: INT8 KV-Cache Attention (Medium)#250
kunal-mansukhani merged 1 commit intomainfrom
add-challenge-96-int8-kv-cache-attention

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented Apr 18, 2026

Summary

  • Adds challenge 96: INT8 KV-Cache Attention (Medium difficulty)
  • Decode-phase multi-head attention where the KV cache is stored as int8 with per-token float32 scale factors — matching how production LLM serving systems (TensorRT-LLM, vLLM) halve KV-cache memory bandwidth versus fp32
  • Solvers must dequantize K and V on-the-fly (K_float[h,s,d] = K_int8[h,s,d] × k_scale[h,s]) then run scaled dot-product attention
  • Teaches: INT8 dequantization fused with attention, mixed-precision arithmetic, warp-level reductions, memory bandwidth trade-offs in LLM inference

Test plan

  • pre-commit run --all-files passes (black, isort, flake8, clang-format, mojo format)
  • run_challenge.py --action run — example test passes
  • run_challenge.py --action submit — all functional + performance tests pass on NVIDIA Tesla T4
  • Checklist from CLAUDE.md verified: all 6 starter files present, correct comment style for medium difficulty, HTML has all 4 required sections, performance bullet matches generate_performance_test()

🤖 Generated with Claude Code

Decode-phase multi-head attention with INT8 KV cache and per-token
scale factors, modelling how production LLM serving systems (TensorRT-LLM,
vLLM) halve KV-cache memory bandwidth.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@kunal-mansukhani kunal-mansukhani merged commit 31fd25e into main Apr 19, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant