Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 57 additions & 8 deletions .github/workflows/pr-perfbench-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- created

permissions:
contents: read
contents: write
Comment on lines 8 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Add actions: write permission for cache operations.

The current permissions block only specifies contents: write. When a permissions block is defined, any unlisted permission (including actions) defaults to none. This prevents setup-python's cache from being saved—it will only restore from cache but never write new entries.

Apply this diff:

 permissions:
   contents: write
+  actions: write

Based on learnings: In GitHub Actions workflows with a permissions block, setup-python's cache feature requires actions: write to save caches; without it, caches restore but never persist.


I'll verify this review comment by checking the codebase and GitHub Actions documentation.


🏁 Script executed:

#!/bin/bash
# Check if the workflow file exists and examine permissions block and setup-python usage
cat .github/workflows/pr-perfbench-bot.yml 2>/dev/null || echo "File not found"

Length of output: 4705


🌐 Web query:

GitHub Actions setup-python cache requires actions write permission

💡 Result:

Short answer: actions/cache (used by setup-python and many wrapper actions) needs the workflow's GITHUB_TOKEN to have actions: write when it needs to create/delete caches. Add permissions to the workflow or provide a PAT (for fork PRs where GITHUB_TOKEN is limited).

What to do

  • In your workflow or job, grant the permission:
    permissions:
    actions: write
    contents: read
  • Example (top of workflow):
    permissions:
    actions: write
    contents: read

Notes

  • For pull requests from forks the GITHUB_TOKEN is read-only for some permissions; to allow cache writes/deletes for fork PRs you must use a personal access token (repo scope) stored in a secret and pass it to the cache action (or disable deletion). [1][2][3]

Sources

  • cache action / examples requiring actions: write. [1]
  • REST / API note: deleting caches requires an auth token with repo scope or GitHub App actions: write. [2]
  • Marketplace/README examples calling out actions: write for cache deletion. [3]

Add actions: write permission for cache operations.

The workflow uses actions/setup-python@v6 with cache: pip enabled. When a permissions block is defined in GitHub Actions, the actions/cache (used by setup-python) requires actions: write permission to save caches. Without this permission, caches will restore but not persist across runs, degrading workflow performance.

Apply this diff:

 permissions:
   contents: write
+  actions: write
🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml around lines 8 to 9 currently grants only
contents: write; add an actions: write permission to the permissions block so
the actions/cache used by actions/setup-python (cache: pip) can persist
caches—update the permissions to include actions: write alongside contents:
write (keep existing permissions and only add actions: write).


concurrency:
group: "${{ github.workflow }}-${{ github.ref }}"
Expand Down Expand Up @@ -53,8 +53,12 @@ jobs:
run: |
python -m venv tll
source tll/bin/activate
pip install -r requirements-test.txt
pip install .
export PIP_CONFIG_FILE=/dev/null
export PYTHONUSERBASE=""
pip config unset global.user
pip config unset user.user
pip install --no-user -r requirements-test.txt
pip install --no-user .

- name: Install original version
run: |
Expand All @@ -64,25 +68,70 @@ jobs:
git checkout main
python -m venv tl
source tl/bin/activate
pip install -r requirements-test.txt
pip install .
export PIP_CONFIG_FILE=/dev/null
export PYTHONUSERBASE=""
pip config unset global.user || true
pip config unset user.user || true
pip install --no-user -r requirements-test.txt
pip install --no-user .

- name: Run performance test
id: perfbench
run: |
source tl/bin/activate
python maint/scripts/ci_performance.py
- name: Read markdown table
id: read_md
run: |
echo "content<<EOF" >> $GITHUB_OUTPUT
cat bench.md >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: Upload PNG to GitHub and get URL
id: upload_png
uses: actions/github-script@v8
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
const content = fs.readFileSync('bench.png').toString('base64');
// Create blob in the repo
const blob = await github.rest.git.createBlob({
owner: context.repo.owner,
repo: context.repo.repo,
content: content,
encoding: "base64",
});
// Attach blob as a tree item
const tree = await github.rest.git.createTree({
owner: context.repo.owner,
repo: context.repo.repo,
tree: [{
path: `bench_${context.runId}.png`,
mode: '100644',
type: 'blob',
sha: blob.data.sha
}]
});
// Raw file URL (works for embedding image)
const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png`
core.setOutput("url", url);
Comment on lines +89 to +117
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the PNG upload and URL generation approach.

The current implementation creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. This will not work because:

  • Raw GitHub URLs require a commit SHA, branch name, or tag—not a tree SHA
  • The created blob/tree exists in Git storage but is not reachable from any ref
  • The URL https://raw.githubusercontent.com/.../[tree-sha]/... will return 404

Consider these alternatives:

Option 1: Upload as workflow artifact and use GitHub API to get URL

- name: Upload benchmark chart
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-chart
    path: bench.png

Then link to the artifact in the comment (artifacts are accessible via GitHub UI but not embeddable).

Option 2: Commit and push the image to a dedicated branch

- name: Upload PNG to GitHub
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git checkout --orphan benchmark-images
    git rm -rf .
    cp bench.png benchmark-${{ github.run_id }}.png
    git add benchmark-${{ github.run_id }}.png
    git commit -m "Add benchmark image for run ${{ github.run_id }}"
    git push origin benchmark-images --force

Then use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png

Option 3: Use external image hosting service or GitHub issue attachments API

🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml lines 83-111: the workflow currently
creates a blob and tree but never makes a commit or updates a ref, then builds a
raw.githubusercontent URL using a tree SHA (which will 404). Replace this with
one of the suggested approaches: either (A) upload bench.png as a workflow
artifact using actions/upload-artifact@v4 and reference the artifact in the
comment, or (B) commit the PNG to a dedicated branch (configure git user, create
or checkout a branch, add bench_<runId>.png, commit and push) and then construct
the raw.githubusercontent.com URL using the branch name, or (C) upload to an
external image host or use the GitHub Issue attachments API; implement the
chosen approach and remove the orphaned blob/tree creation and the invalid raw
URL generation.


- name: Post test results as PR comment
uses: actions/github-script@v8
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const md = `${{ steps.read_md.outputs.content }}`;
const img = `${{ steps.upload_png.outputs.url }}`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: '📊 ​**Performance Test Results** (triggered by @' + context.payload.comment.user.login + '):\n\n' +
'Run listed here: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}\n\n' +
"${{ steps.perfbench.outputs.stdout }}"
body:
'📊 **Performance Test Results** (triggered by @' +
context.payload.comment.user.login + ')\n\n' +
'Run: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}\n\n' +
md +
'\n\n📈 **Speedup Plot:**\n\n' +
`![Speedup Plot](${img})`
})
15 changes: 15 additions & 0 deletions examples/analyze/bench_example_analyze.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import tilelang.tools.bench
import example_conv_analyze
import example_gemm_analyze


def bench_example_gemm_analyze():
tilelang.tools.bench.process_func(example_gemm_analyze.main)


def bench_example_conv_analyze():
tilelang.tools.bench.process_func(example_conv_analyze.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
52 changes: 52 additions & 0 deletions examples/attention_sink/bench_example_attention_sink.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import tilelang.tools.bench
import example_gqa_sink_bwd_bhsd
import example_gqa_sink_fwd_bhsd_wgmma_pipelined
import example_mha_sink_bwd_bhsd
import example_mha_sink_fwd_bhsd
import example_mha_sink_fwd_bhsd_wgmma_pipelined


def bench_example_mha_sink_fwd_bhsd():
tilelang.tools.bench.process_func(example_mha_sink_fwd_bhsd.main)


def bench_example_mha_sink_fwd_bhsd_sliding_window():
tilelang.tools.bench.process_func(example_mha_sink_fwd_bhsd.main, window_size=128)


def bench_example_mha_sink_fwd_bhsd_wgmma_pipelined():
tilelang.tools.bench.process_func(example_mha_sink_fwd_bhsd_wgmma_pipelined.main)


def bench_example_mha_sink_fwd_bhsd_wgmma_pipelined_sliding_window():
tilelang.tools.bench.process_func(
example_mha_sink_fwd_bhsd_wgmma_pipelined.main, window_size=128)


def bench_example_gqa_sink_fwd_bhsd_wgmma_pipelined():
tilelang.tools.bench.process_func(example_gqa_sink_fwd_bhsd_wgmma_pipelined.main)


def bench_example_gqa_sink_fwd_bhsd_wgmma_pipelined_sliding_window():
tilelang.tools.bench.process_func(
example_gqa_sink_fwd_bhsd_wgmma_pipelined.main, window_size=128)


def bench_example_mha_sink_bwd_bhsd():
tilelang.tools.bench.process_func(example_mha_sink_bwd_bhsd.main)


def bench_example_mha_sink_bwd_bhsd_sliding_window():
tilelang.tools.bench.process_func(example_mha_sink_bwd_bhsd.main, window_size=128)


def bench_example_gqa_sink_bwd_bhsd():
tilelang.tools.bench.process_func(example_gqa_sink_bwd_bhsd.main)


def bench_example_gqa_sink_bwd_bhsd_sliding_window():
tilelang.tools.bench.process_func(example_gqa_sink_bwd_bhsd.main, window_size=128)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import tilelang.tools.bench
import block_sparse_attn_triton
import example_tilelang_block_sparse_attn
import example_tilelang_sparse_gqa_decode_varlen_indice
import example_tilelang_sparse_gqa_decode_varlen_mask
import example_triton_sparse_gqa_decode_varlen_indice
import example_triton_sparse_gqa_decode_varlen_mask


def bench_block_sparse_attn_triton():
tilelang.tools.bench.process_func(block_sparse_attn_triton.main)


def bench_example_tilelang_block_sparse_attn():
tilelang.tools.bench.process_func(example_tilelang_block_sparse_attn.main)


def bench_example_tilelang_sparse_gqa_decode_varlen_indice():
tilelang.tools.bench.process_func(
example_tilelang_sparse_gqa_decode_varlen_indice.main, batch=1, max_cache_seqlen=2048)


def bench_example_tilelang_sparse_gqa_decode_varlen_mask():
tilelang.tools.bench.process_func(
example_tilelang_sparse_gqa_decode_varlen_mask.main, batch=1, max_cache_seqlen=2048)


def bench_example_triton_sparse_gqa_decode_varlen_indice():
tilelang.tools.bench.process_func(
example_triton_sparse_gqa_decode_varlen_indice.main,
batch=8,
heads=8,
heads_kv=4,
max_cache_seqlen=2048,
dim=128,
dim_v=128,
sparse_ratio=0.8,
block_size=32)


def bench_example_triton_sparse_gqa_decode_varlen_mask():
tilelang.tools.bench.process_func(
example_triton_sparse_gqa_decode_varlen_mask.main,
batch=8,
heads=8,
heads_kv=4,
max_cache_seqlen=2048,
dim=128,
dim_v=128,
sparse_ratio=0.8,
block_size=32)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
10 changes: 10 additions & 0 deletions examples/blocksparse_gemm/bench_example_blocksparse_gemm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import tilelang.tools.bench
import example_blocksparse_gemm


def bench_example_blocksparse_gemm():
tilelang.tools.bench.process_func(example_blocksparse_gemm.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
21 changes: 21 additions & 0 deletions examples/cast/bench_example_cast.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import tilelang.tools.bench
import example_group_per_split_token_cast_to_fp8
import example_per_token_cast_to_fp8


def bench_example_group_per_split_token_cast_to_fp8():
tilelang.tools.bench.process_func(
example_group_per_split_token_cast_to_fp8.main,
M=1024,
N=1024,
BG=2,
blk_m=4,
batch_sizes=[128, 896])


def bench_example_per_token_cast_to_fp8():
tilelang.tools.bench.process_func(example_per_token_cast_to_fp8.main, M=2048, N=512, blk_m=8)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Comment on lines +20 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use idiomatic Python for main guard.

The expression globals().get("__name__") == "__main__" is unnecessarily verbose and non-standard. The conventional Python idiom is if __name__ == "__main__":, which is more readable and widely recognized.

Apply this change:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()

Note: This same pattern appears in all benchmark wrapper files in this PR and should be fixed consistently across:

  • examples/elementwise/bench_example_elementwise.py (line 9)
  • examples/seer_attention/bench_block_sparse_attn_tilelang.py (line 9)
  • examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (line 10)
  • examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (line 9)
  • examples/dynamic_shape/bench_example_dynamic.py (line 9)
  • examples/deepseek_nsa/bench_example_tilelang_nsa.py (line 14)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
if __name__ == "__main__":
tilelang.tools.bench.main()
🤖 Prompt for AI Agents
In examples/cast/bench_example_cast.py around lines 20-21, replace the
non-idiomatic main guard `globals().get("__name__") == "__main__"` with the
conventional Python idiom `if __name__ == "__main__":` to improve readability;
apply the same change consistently to the other benchmark wrapper files listed
in the comment (examples/elementwise/bench_example_elementwise.py line 9,
examples/seer_attention/bench_block_sparse_attn_tilelang.py line 9,
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py line 10,
examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py line 9,
examples/dynamic_shape/bench_example_dynamic.py line 9,
examples/deepseek_nsa/bench_example_tilelang_nsa.py line 14).

15 changes: 15 additions & 0 deletions examples/convolution/bench_example_convolution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import tilelang.tools.bench
import example_convolution
import example_convolution_autotune


def bench_example_convolution():
tilelang.tools.bench.process_func(example_convolution.main)


def bench_example_convolution_autotune():
tilelang.tools.bench.process_func(example_convolution_autotune.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
10 changes: 10 additions & 0 deletions examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import tilelang.tools.bench
import example_deepgemm_fp8_2xAcc


def bench_example_deepgemm_fp8_2xAcc():
tilelang.tools.bench.process_func(example_deepgemm_fp8_2xAcc.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
10 changes: 10 additions & 0 deletions examples/deepseek_mla/bench_example_mla_decode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import tilelang.tools.bench
import example_mla_decode


def bench_example_mla_decode():
tilelang.tools.bench.process_func(example_mla_decode.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
15 changes: 15 additions & 0 deletions examples/deepseek_nsa/bench_example_tilelang_nsa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import tilelang.tools.bench
import example_tilelang_nsa_fwd
import example_tilelang_nsa_decode


def bench_example_tilelang_nsa_fwd():
tilelang.tools.bench.process_func(example_tilelang_nsa_fwd.main)


def bench_example_tilelang_nsa_fwd_decode():
tilelang.tools.bench.process_func(example_tilelang_nsa_decode.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
64 changes: 64 additions & 0 deletions examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import tilelang.tools.bench
import fp8_lighting_indexer
import sparse_mla_bwd
import sparse_mla_fwd
import sparse_mla_fwd_pipelined
import topk_selector


def bench_topk_selector():
tilelang.tools.bench.process_func(topk_selector.test_topk_selector)


def bench_fp8_lighting_indexer():
tilelang.tools.bench.process_func(
fp8_lighting_indexer.test_fp8_lighting_indexer,
S=512,
SKV=1024,
H=32,
HKV=1,
D=64,
kv_stride=1)


def bench_sparse_mla_fwd():
tilelang.tools.bench.process_func(
sparse_mla_fwd.test_sparse_mla_fwd,
S=256,
SKV=1024,
H=64,
HKV=1,
DQK=576,
DV=512,
topk=256,
check_correctness=False)


def bench_sparse_mla_fwd_pipelined():
tilelang.tools.bench.process_func(
sparse_mla_fwd_pipelined.test_sparse_mla_fwd_pipelined,
S=256,
SKV=512,
H=64,
HKV=1,
DQK=576,
DV=512,
topk=256,
check_correctness=False)


def bench_sparse_mla_bwd():
tilelang.tools.bench.process_func(
sparse_mla_bwd.test_sparse_mla_bwd,
S=256,
SKV=512,
H=64,
HKV=1,
DQKV=576,
DV=512,
topk=256,
check_correctness=False)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Loading
Loading