Add challenge 95: Decode-Phase Attention (Medium) by claude[bot] · Pull Request #248 · AlphaGPU/leetgpu-challenges

claude · 2026-04-16T04:59:27Z

Summary

Adds challenge 95: Decode-Phase Attention (Medium difficulty)
Models the single-token-query attention used during autoregressive LLM inference decode steps: Q has shape (batch_size, num_q_heads, head_dim) — no sequence dimension — while K and V are the full KV cache (batch_size, num_kv_heads, cache_len, head_dim)
Supports Grouped Query Attention (GQA): num_q_heads / num_kv_heads query heads share each KV head
Performance test: LLaMA-3 8B-style config — batch_size=4, num_q_heads=32, num_kv_heads=8, cache_len=16,384, head_dim=128

Why this is interesting

This challenge teaches a key GPU programming concept: the same attention formula requires a completely different implementation strategy at decode time vs. training time. Training attention (e.g., GQA challenge #80, Flash Attention PR #232) is compute-bound with equal-length Q and KV; decode-phase attention is memory-bandwidth-bound with a single-token query streaming over the entire KV cache. Efficient decode kernels parallelize over batch/heads and reduce over cache_len, a pattern not covered by any existing challenge or open PR.

Test plan

All 6 starter files present (starter.cu, starter.pytorch.py, starter.triton.py, starter.jax.py, starter.cute.py, starter.mojo)
10 functional test cases: edge cases (cache_len=1,2), zero inputs, MQA (kv_heads=1), GQA groups=2, MHA (kv_heads=q_heads), power-of-2 and non-power-of-2 cache lengths, realistic LLaMA-3 config
pre-commit run --all-files passes (black, isort, flake8, clang-format, mojo format)
Validated on NVIDIA Tesla T4 via run_challenge.py --action submit → "✓ All tests passed"
Checklist in CLAUDE.md verified

🤖 Generated with Claude Code

Single-token-query attention over a full KV cache, the dominant kernel in autoregressive LLM decode steps. Supports Grouped Query Attention (GQA) where multiple query heads share one KV head. Teaches the memory-bandwidth- bound nature of decode-phase workloads, distinct from compute-bound training attention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude Bot requested review from ishaan-arya, kunal-mansukhani and shxjames as code owners April 16, 2026 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add challenge 95: Decode-Phase Attention (Medium)#248

Add challenge 95: Decode-Phase Attention (Medium)#248
claude[bot] wants to merge 1 commit intomainfrom
add-challenge-95-decode-phase-attention

claude Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude Bot commented Apr 16, 2026

Summary

Why this is interesting

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants