You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Triton + CuTe Backend, enable more DSL support (#35)
* triton_backend_v2
* fix eval bugs
* fix issues
* revert eval
* remove traceback
* remove cot
* improve eval
* looked over pr and added future support for other languages
* updated requirements
* added back requirements.txt
* add cute one shot addition example
* remove unncessary files and redo requirements
* let's see if that fixes it
* fix config in file suggested soksoerey
* move natalia's old file into change log
---------
Co-authored-by: AffectionateCurry <[email protected]>
Co-authored-by: nathanjpaek <[email protected]>
Co-authored-by: Simon Guo <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+7-17Lines changed: 7 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,8 @@ We construct KernelBench to have 4 Levels of categories:
26
26
-**Level 4 🤗**: Level Hugging Face
27
27
Optimize whole model architectures from HuggingFace
28
28
29
+
We are actively extending KernelBench to other DSLs beyond `cuda` as well.
30
+
29
31
## ⚖️ Evaluation
30
32
#### Methodology
31
33
To evaluate model-generated kernels, we need to check if they:
@@ -47,6 +49,7 @@ Some examples to illustrate this metric that filters based on speedups:
47
49
48
50
You can increase speedup threshold `p` to make the task more challenging.
49
51
52
+
50
53
#### Compute Overall Benchmark Performance
51
54
52
55
We provide a script `scripts/greedy_analysis.py` to compute the overall benchmark performance.
@@ -95,6 +98,8 @@ python3 scripts/generate_and_eval_single_sample.py dataset_src="huggingface" lev
95
98
# add .verbose_logging for more visbility
96
99
```
97
100
101
+
We are also supporting other GPU programming languages beyond `cuda`. Simply specify `backend=triton`. For now we support (`cuda`, `triton`, `cute`).
102
+
98
103
### Run on all problems
99
104
100
105
```
@@ -120,25 +125,10 @@ We provide some reference baseline times a variety of NVIDIA GPUs across generat
120
125
We have also releaed the test-time framework [Caesar](https://github.com/simonguozirui/caesar) that are used in the multi-turn / iterative refinement experiments in our paper. You can use or modify this framework for high-throughput test-time scaling (both sequential and parallel) targeting KernelBench problems.
121
126
122
127
## 🛣️ Upcoming Roadmap
123
-
-[ ] Triton Variant (To be merged)
124
-
-[ ] Easy to use CoLab Notebook Example
125
-
-[ ] Push button flow on Modal / Cloud Provider
126
-
-[ ] Integrate with more frameworks, such as [ThunderKittens](https://github.com/HazyResearch/ThunderKittens)
127
-
-[ ] Add backward pass
128
-
-[ ] Integrate with toolchains such as NCU
129
-
See Issues for the ongoing roadmap and directions.
130
-
131
-
128
+
Check out our [roadmap](https://github.com/ScalingIntelligence/KernelBench/issues/74) for what we plan to add as features. We welcome community contirbutions in these directions.
132
129
133
130
## 🔍 Known Usage
134
-
-[NVIDIA](https://developer.nvidia.com/blog/automating-gpu-kernel-generation-with-deepseek-r1-and-inference-time-scaling/) - Automating GPU Kernel Generation with DeepSeek-R1 and Inference Time Scaling
-[Sakana AI](https://sakana.ai/ai-cuda-engineer/) - AI Cuda Engineer
137
-
-[Project Popcorn](https://www.youtube.com/watch?v=mdDVkBeFy9A) - Triton Support for KernelBench, Data Scaling + SFT'd Kernel LLM
138
-
-[Kevin](https://cognition.ai/blog/kevin-32b) - Kevin-32B: Multi-Turn RL for Writing CUDA Kernels
139
-
-[Simple Test-Time Search](https://scalingintelligence.stanford.edu/blogs/fastkernels/) - by @anneouyang
140
-
141
-
If you are using KernelBench, we love to hear more about it!
131
+
Since release, we have gotten a lot of interest from researchers, research labs, and companies that use KernelBench to explore this direction. We have documented [known usage](https://docs.google.com/document/d/e/2PACX-1vTjS-UMH1HB5n_PENq2k-3YRfXIXkqKIKeNC2zcWMyLPdl4Jrwvdk4dNDVSsM8ybKrCxZB7GJq1slZF/pub) of KernelBench and related efforts towards automated kernel generations. If you are using KernelBench, we love to hear more about it!
0 commit comments