Add challenge 96: INT8 KV-Cache Attention (Medium) by claude[bot] · Pull Request #250 · AlphaGPU/leetgpu-challenges

claude · 2026-04-18T04:40:38Z

Summary

Adds challenge 96: INT8 KV-Cache Attention (Medium difficulty)
Decode-phase multi-head attention where the KV cache is stored as int8 with per-token float32 scale factors — matching how production LLM serving systems (TensorRT-LLM, vLLM) halve KV-cache memory bandwidth versus fp32
Solvers must dequantize K and V on-the-fly (K_float[h,s,d] = K_int8[h,s,d] × k_scale[h,s]) then run scaled dot-product attention
Teaches: INT8 dequantization fused with attention, mixed-precision arithmetic, warp-level reductions, memory bandwidth trade-offs in LLM inference

Test plan

pre-commit run --all-files passes (black, isort, flake8, clang-format, mojo format)
run_challenge.py --action run — example test passes
run_challenge.py --action submit — all functional + performance tests pass on NVIDIA Tesla T4
Checklist from CLAUDE.md verified: all 6 starter files present, correct comment style for medium difficulty, HTML has all 4 required sections, performance bullet matches generate_performance_test()

🤖 Generated with Claude Code

Decode-phase multi-head attention with INT8 KV cache and per-token scale factors, modelling how production LLM serving systems (TensorRT-LLM, vLLM) halve KV-cache memory bandwidth. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Add challenge 96: INT8 KV-Cache Attention (Medium)

fe65649

Decode-phase multi-head attention with INT8 KV cache and per-token scale factors, modelling how production LLM serving systems (TensorRT-LLM, vLLM) halve KV-cache memory bandwidth. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

claude Bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners April 18, 2026 04:40

kunal-mansukhani approved these changes Apr 19, 2026

View reviewed changes

kunal-mansukhani merged commit 31fd25e into main Apr 19, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 96: INT8 KV-Cache Attention (Medium)#250

Add challenge 96: INT8 KV-Cache Attention (Medium)#250
kunal-mansukhani merged 1 commit intomainfrom
add-challenge-96-int8-kv-cache-attention

claude Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claude Bot commented Apr 18, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant