Skip to content

Commit 843d962

Browse files
authored
Update static_shapes docs (#951)
1 parent f86e7f7 commit 843d962

File tree

3 files changed

+24
-17
lines changed

3 files changed

+24
-17
lines changed

docs/api/kernel.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,10 @@ bound_static = shape_specialized_kernel.bind((torch.randn(100, 50),))
8888
result = bound_static(torch.randn(100, 50)) # Must be exactly [100, 50]
8989
```
9090

91+
```{warning}
92+
Helion shape-specializes kernels by default (`static_shapes=True`) for the best performance. Bound kernels and caches require tensors with the exact same shapes and strides as the examples you compile against. Set `static_shapes=False` if you need the same compiled kernel to serve many shapes.
93+
```
94+
9195
### BoundKernel Methods
9296

9397
The returned BoundKernel has these methods:
@@ -131,9 +135,9 @@ print(triton_code)
131135
Kernels are automatically cached based on:
132136

133137
- **Argument types** (dtype, device)
134-
- **Tensor shapes** (when using `static_shapes=True`)
138+
- **Tensor shapes** (default: `static_shapes=True`)
135139

136-
By default (`static_shapes=False`), kernels only specialize on basic shape categories (0, 1, or ≥2 per dimension) rather than exact shapes, allowing the same compiled kernel to handle different tensor sizes efficiently.
140+
By default (`static_shapes=True`), Helion treats shapes and strides as compile-time constants, baking them into generated Triton code for the best performance. To reuse a single compiled kernel across size variations, set `static_shapes=False`, which instead buckets each dimension as `{0, 1, ≥2}` and allows more inputs to share the same cache entry.
137141

138142
```python
139143
# These create separate cache entries

docs/api/settings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ with helion.set_default_settings(
9898
9999
.. autoattribute:: Settings.static_shapes
100100
101-
When enabled, tensor shapes are treated as compile-time constants for optimization. Default is ``False``.
101+
When enabled, tensor shapes are treated as compile-time constants for optimization. Default is ``True``. Set this to ``False`` if you need a single compiled kernel instance to serve many shape variants.
102102
```
103103

104104
### Autotuning Settings

docs/deployment_autotuning.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -146,13 +146,16 @@ config and selecting the fastest.
146146
A key detail here is controlling the specialization key, which
147147
determines when to re-benchmark. Options include:
148148

149-
- **Default (dynamic shapes):** we reuse the timing result as long as
150-
tensor dtypes and device types stay constant. Shape changes only trigger
151-
a re-selection when a dimension size crosses the buckets `{0, 1, ≥2}`.
149+
- **Default (`static_shapes=True`):** Helion shape-specializes on the exact
150+
shape/stride signature, rerunning the selection whenever those shapes
151+
differ. This delivers the best per-shape performance but requires all calls
152+
to match the example shapes exactly.
152153

153-
- **`static_shapes=True`:** add this setting to the decorator to specialize
154-
on the exact shape/stride signature, rerunning the selection whenever
155-
those shapes differ.
154+
- **`static_shapes=False`:** switch to bucketed dynamic shapes. Helion
155+
reuses results as long as tensor dtypes and device types stay constant.
156+
Shape changes only trigger a re-selection when a dimension size crosses
157+
the buckets `{0, 1, ≥2}`. Use this when you need one compiled kernel to
158+
handle many input sizes.
156159

157160
- **Custom keys:** pass `key=` to group calls however you like.
158161
This custom key is in addition to the above.
@@ -197,15 +200,15 @@ input types. You can pre-compile as many configs as you need using
197200
`BoundKernel.compile_config`. **Warning:** `kernel.bind()` specializes,
198201
and the result will only work with the same input types you passed.
199202

200-
- With `static_shapes=False` (default) it will specialize on the input
201-
dtypes, device types, and whether each dynamic dimension falls into the
202-
0, 1, or ≥2 bucket. Python types are also specialized. For dimensions
203-
that can vary across those buckets, supply representative inputs ≥2
204-
to avoid excessive specialization.
203+
- With `static_shapes=True` (default) the bound kernel only works for the
204+
exact shape/stride signature of the example inputs. The generated code
205+
has shapes baked in, which often provides a performance boost.
205206

206-
- With `static_shapes=True` the bound kernel only works for the exact
207-
shape/stride signature of the example inputs. The generated code will
208-
have shapes baked in, which often provides a performance boost.
207+
- With `static_shapes=False` it will specialize on the input dtypes,
208+
device types, and whether each dynamic dimension falls into the 0, 1,
209+
or ≥2 bucket. Python types are also specialized. For dimensions that
210+
can vary across those buckets, supply representative inputs ≥2 to avoid
211+
excessive specialization.
209212

210213
If you need to support multiple input types, bind multiple times with
211214
representative inputs.

0 commit comments

Comments
 (0)