-
Notifications
You must be signed in to change notification settings - Fork 322
[Feature]: Add benchmark scripts for examples #1287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughAdds a benchmarking utility (tilelang.tools.bench), ~30 new example bench wrapper scripts, a CI performance aggregation/plotting script, updates the perfbench GitHub Actions workflow to publish comment + image, and includes a runtime log file from benchmark runs. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Bench as tilelang.tools.bench
participant Example as Example Module
participant Store as _RECORDS / bench.png
User->>Bench: main()
Bench->>Bench: discover bench_* functions
loop per bench_*
Bench->>Example: process_func(target_func) (warmup, suppressed output)
loop repeats
Bench->>Example: call target_func (timed)
Example-->>Bench: result or exception
Bench->>Store: record latency or failure
end
end
Bench->>Bench: analyze_records(out_dir)
Bench->>Store: write bench.md and bench.png
Bench-->>User: print results and saved image path
sequenceDiagram
participant GHA as GitHub Actions
participant Workflow as pr-perfbench-bot.yml
participant CI as maint/scripts/ci_performance.py
participant BenchTool as tilelang.tools.bench
participant GHApi as GitHub API (blob/contents)
participant PR as PR Comment
GHA->>Workflow: run perf workflow
Workflow->>CI: run ci_performance.py
CI->>BenchTool: start benchmarks
BenchTool-->>CI: bench.md + bench.png
CI->>Workflow: set outputs (bench.md, bench.png path)
Workflow->>GHApi: upload bench.png as blob -> get raw URL
Workflow->>PR: post comment body with bench.md and embedded image URL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/perf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Nitpick comments (19)
examples/deepseek_mla/bench_example_mla_decode.py (1)
9-10: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1)
9-10: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1)
9-10: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/gemv/bench_example_gemv.py (1)
9-10: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/analyze/bench_example_analyze.py (1)
14-15: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/gemm_splitk/bench_example_gemm_splitk.py (1)
14-15: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (1)
63-64: Simplify the name check.The pattern
if globals().get("__name__") == "__main__":is unconventional. Use the standard idiom instead.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()tilelang/tools/bench.py (1)
51-53: Bare exception catch loses error context.Catching
Exceptionwithout inspecting the exception type can mask important failure modes (e.g.,KeyboardInterruptinheritance issues, resource exhaustion). The traceback is printed but the exception type/message is not recorded in warnings.Enhance error reporting:
except Exception: fail_count += 1 - traceback.print_exc(file=sys.stderr) + exc_info = sys.exc_info() + traceback.print_exception(*exc_info, file=sys.stderr)examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)
2-2: Remove unused import.The
tilelangimport on line 2 is not used anywhere in this file. Removing unused imports improves code clarity and may slightly reduce import time.import tilelang.tools.bench -import tilelang import tilelang_example_sparse_tensorcoreexamples/deepseek_nsa/bench_example_tilelang_nsa.py (1)
10-11: Minor naming inconsistency.The function name
bench_example_tilelang_nsa_fwd_decodeincludes "fwd_" prefix, but the imported module isexample_tilelang_nsa_decode(without "fwd"). While this works correctly, the inconsistency might cause confusion. Consider renaming tobench_example_tilelang_nsa_decodefor consistency.-def bench_example_tilelang_nsa_fwd_decode(): +def bench_example_tilelang_nsa_decode(): tilelang.tools.bench.process_func(example_tilelang_nsa_decode.main)examples/minference/bench_vs_sparse_attn.py (1)
9-10: Simplify the main guard to use idiomatic Python.The pattern
globals().get("__name__")is unnecessarily complex. The standard idiom__name__directly accesses the built-in module name without indirection.Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/linear_attention/bench_linear_attn.py (1)
14-15: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/flash_decoding/bench_example_flash_decoding.py (1)
15-16: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/fusedmoe/bench_example_fusedmoe.py (1)
17-18: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/convolution/bench_example_convolution.py (1)
14-15: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/warp_specialize/bench_example_warp_specialize.py (1)
28-29: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/blocksparse_attention/bench_example_blocksparse_attention.py (1)
54-55: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()examples/dequantize_gemm/bench_example_dequantize_gemm.py (1)
34-35: Simplify the main guard to use idiomatic Python.Use the standard
__name__idiom instead of accessing throughglobals().Apply this diff:
-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()log.txt (1)
1-476: Avoid committing full benchmark runtime logs to the repoThis file is a raw, environment-specific run log (warnings, stack traces, CUDA OOMs, etc.). Keeping it version-controlled will create noisy diffs and quickly become stale. It’s better treated as a CI artifact or local debug output rather than source.
I suggest either:
- Removing
log.txtfrom the repo and adding it to.gitignore, or- Replacing it with a short, curated example log snippet in documentation if you need to illustrate expected failures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
.github/workflows/pr-perfbench-bot.yml(3 hunks)examples/analyze/bench_example_analyze.py(1 hunks)examples/attention_sink/bench_example_attention_sink.py(1 hunks)examples/blocksparse_attention/bench_example_blocksparse_attention.py(1 hunks)examples/blocksparse_gemm/bench_example_blocksparse_gemm.py(1 hunks)examples/cast/bench_example_cast.py(1 hunks)examples/convolution/bench_example_convolution.py(1 hunks)examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py(1 hunks)examples/deepseek_mla/bench_example_mla_decode.py(1 hunks)examples/deepseek_nsa/bench_example_tilelang_nsa.py(1 hunks)examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py(1 hunks)examples/dequantize_gemm/bench_example_dequantize_gemm.py(1 hunks)examples/dynamic_shape/bench_example_dynamic.py(1 hunks)examples/elementwise/bench_example_elementwise.py(1 hunks)examples/flash_attention/bench_example_flash_attention.py(1 hunks)examples/flash_decoding/bench_example_flash_decoding.py(1 hunks)examples/fusedmoe/bench_example_fusedmoe.py(1 hunks)examples/gemm/bench_example_gemm.py(1 hunks)examples/gemm_fp8/bench_example_gemm_fp8.py(1 hunks)examples/gemm_splitk/bench_example_gemm_splitk.py(1 hunks)examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py(1 hunks)examples/gemv/bench_example_gemv.py(1 hunks)examples/linear_attention/bench_linear_attn.py(1 hunks)examples/minference/bench_vs_sparse_attn.py(1 hunks)examples/seer_attention/bench_block_sparse_attn_tilelang.py(1 hunks)examples/sparse_tensorcore/bench_example_sparse_tensorcore.py(1 hunks)examples/topk/bench_topk_tilelang.py(1 hunks)examples/warp_specialize/bench_example_warp_specialize.py(1 hunks)log.txt(1 hunks)maint/scripts/ci_performance.py(1 hunks)tilelang/tools/bench.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/pr-perfbench-bot.yml
🧬 Code graph analysis (28)
examples/gemm_splitk/bench_example_gemm_splitk.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/seer_attention/bench_block_sparse_attn_tilelang.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/gemm/bench_example_gemm.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (6)
examples/deepseek_v32/sparse_mla_bwd.py (2)
sparse_mla_bwd(283-320)test_sparse_mla_bwd(334-384)tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)examples/deepseek_v32/topk_selector.py (1)
test_topk_selector(188-245)examples/deepseek_v32/fp8_lighting_indexer.py (1)
test_fp8_lighting_indexer(260-304)examples/deepseek_v32/sparse_mla_fwd.py (1)
test_sparse_mla_fwd(255-301)examples/deepseek_v32/sparse_mla_fwd_pipelined.py (1)
test_sparse_mla_fwd_pipelined(404-456)
examples/elementwise/bench_example_elementwise.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/deepseek_mla/bench_example_mla_decode.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/dynamic_shape/bench_example_dynamic.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/deepseek_nsa/bench_example_tilelang_nsa.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/warp_specialize/bench_example_warp_specialize.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/flash_attention/bench_example_flash_attention.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/gemm_fp8/bench_example_gemm_fp8.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/fusedmoe/bench_example_fusedmoe.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/topk/bench_topk_tilelang.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/blocksparse_attention/bench_example_blocksparse_attention.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/cast/bench_example_cast.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/minference/bench_vs_sparse_attn.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/linear_attention/bench_linear_attn.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/analyze/bench_example_analyze.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
maint/scripts/ci_performance.py (2)
tilelang/env.py (1)
disable_cache(275-276)maint/scripts/performance.py (1)
run(22-69)
examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/dequantize_gemm/bench_example_dequantize_gemm.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/flash_decoding/bench_example_flash_decoding.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/convolution/bench_example_convolution.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/gemv/bench_example_gemv.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
examples/attention_sink/bench_example_attention_sink.py (1)
tilelang/tools/bench.py (2)
process_func(33-71)main(96-108)
🪛 Ruff (0.14.5)
tilelang/tools/bench.py
39-40: try-except-pass detected, consider logging the exception
(S110)
39-39: Do not catch blind exception: Exception
(BLE001)
51-51: Do not catch blind exception: Exception
(BLE001)
102-102: Use of exec detected
(S102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (21)
.github/workflows/pr-perfbench-bot.yml (3)
8-9: Verify that contents: write permission is necessary.The permissions were changed from
readtowrite. Based on learnings,actions: writeis needed for cache saving with restricted permissions blocks. However,contents: writegrants broader access to modify repository contents.Verify whether
contents: writeis required for the blob/tree creation in the upload PNG step, or if there's an alternative approach that doesn't require write access to repository contents.
77-82: LGTM!Reading the markdown table output and exposing it via GitHub Actions outputs is a clean approach.
124-130: Update the comment body construction after fixing PNG upload.Once the PNG upload issue is resolved, ensure the image URL points to a valid, accessible location.
examples/cast/bench_example_cast.py (2)
6-13: LGTM - Benchmark wrapper correctly configured.The function properly delegates to
process_funcwith appropriate parameters for the group-per-split-token cast example. The parameter choices (M=1024, N=1024, BG=2, blk_m=4, batch_sizes=[128, 896]) appear reasonable for benchmarking.
16-17: LGTM - Benchmark wrapper correctly configured.The function properly delegates to
process_funcwith appropriate parameters for the per-token cast example.examples/elementwise/bench_example_elementwise.py (1)
5-6: LGTM - Simple benchmark wrapper.The function correctly delegates to
process_func. No parameters are passed, suggestingexample_elementwise_add.maineither takes no arguments or has appropriate defaults.examples/seer_attention/bench_block_sparse_attn_tilelang.py (1)
5-6: LGTM - Benchmark wrapper follows standard pattern.The function correctly delegates to
process_funcfor the block sparse attention benchmark.examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)
6-7: LGTM - Benchmark wrapper correctly configured.The function properly delegates to
process_funcfor the sparse tensorcore example.examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (2)
1-10: Note: AI summary inconsistency.The AI summary mentions two bench functions (
bench_example_tilelang_gemm_splitkandbench_example_tilelang_gemm_splitk_vectorize_atomicadd), but this file only containsbench_example_tilelang_gemm_streamk. The implementation itself is correct.
5-6: LGTM - Benchmark wrapper follows standard pattern.The function correctly delegates to
process_funcfor the GEMM stream-K example.examples/dynamic_shape/bench_example_dynamic.py (1)
5-6: LGTM - Benchmark wrapper with appropriate parameters.The function correctly delegates to
process_funcwith M=N=K=1024, which are reasonable square matrix dimensions for benchmarking dynamic shape handling.examples/deepseek_nsa/bench_example_tilelang_nsa.py (1)
6-7: LGTM - Benchmark wrapper correctly configured.The function properly delegates to
process_funcfor the NSA forward example.examples/minference/bench_vs_sparse_attn.py (1)
6-6: Empty argv is appropriate for benchmark invocationThe code correctly passes
argv=[]to the target function. Looking at the target function signature inexamples/minference/example_vertical_slash_sparse_attn.py, themainfunction acceptsargv=Noneand usesargparse.parse_args(argv).Passing
argv=[]ensures:
- All arguments use their default values (deterministic behavior)
- No interference from command-line arguments
- Proper benchmarking baseline with reproducible parameters
This is idiomatic usage for benchmark tools.
examples/attention_sink/bench_example_attention_sink.py (1)
9-52: Bench wrappers are consistent with the new benchmarking harnessThese bench_* functions cleanly forward to the respective example_*.main targets (with and without
window_size=128) and integrate correctly withtilelang.tools.bench.main()’s auto-discovery pattern. I don’t see functional issues here; this looks ready to plug into the shared bench infra.examples/topk/bench_topk_tilelang.py (1)
5-10: Straightforward and aligned with the common bench pattern
bench_example_topk()correctly delegates toexample_topk.main, and the main guard defers totilelang.tools.bench.main()so this file will participate in the unified benchmarking flow alongside other examples.examples/gemm_fp8/bench_example_gemm_fp8.py (1)
7-20: GEMM FP8 variants are wired correctly into the bench harnessEach
bench_example_tilelang_gemm_fp8*wrapper calls the appropriate example module’smainviatilelang.tools.bench.process_func, and the module-level__main__guard hooks intobench.main(). This is consistent with the rest of the new bench scripts and looks good.examples/gemm/bench_example_gemm.py (2)
1-5: LGTM!The imports are clean and follow the expected pattern for benchmark wrapper scripts.
8-22: LGTM!The benchmark wrapper functions correctly delegate to
tilelang.tools.bench.process_funcwith appropriate parameters for each GEMM variant.examples/flash_attention/bench_example_flash_attention.py (3)
1-14: LGTM!The imports are well-organized and cover all flash attention benchmark variants.
17-26: LGTM!The benchmark wrappers without explicit parameters correctly delegate to their respective example modules.
Also applies to: 85-90
29-104: Parameter naming is correct—resolves review concern.The verification confirms that the parameter naming inconsistency is intentional and correct. All benchmark functions use parameter names that precisely match their corresponding example modules:
- Backward functions correctly use uppercase (
BATCH,H,N_CTX,D_HEAD) matching backward example signatures- Forward functions correctly use lowercase (
batch,heads,seq_len,dim) matching forward example signaturesNo runtime errors will occur.
| - name: Upload PNG to GitHub and get URL | ||
| id: upload_png | ||
| uses: actions/github-script@v8 | ||
| with: | ||
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| script: | | ||
| const fs = require('fs'); | ||
| const content = fs.readFileSync('bench.png').toString('base64'); | ||
| // Create blob in the repo | ||
| const blob = await github.rest.git.createBlob({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| content: content, | ||
| encoding: "base64", | ||
| }); | ||
| // Attach blob as a tree item | ||
| const tree = await github.rest.git.createTree({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| tree: [{ | ||
| path: `bench_${context.runId}.png`, | ||
| mode: '100644', | ||
| type: 'blob', | ||
| sha: blob.data.sha | ||
| }] | ||
| }); | ||
| // Raw file URL (works for embedding image) | ||
| const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png` | ||
| core.setOutput("url", url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the PNG upload and URL generation approach.
The current implementation creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. This will not work because:
- Raw GitHub URLs require a commit SHA, branch name, or tag—not a tree SHA
- The created blob/tree exists in Git storage but is not reachable from any ref
- The URL
https://raw.githubusercontent.com/.../[tree-sha]/...will return 404
Consider these alternatives:
Option 1: Upload as workflow artifact and use GitHub API to get URL
- name: Upload benchmark chart
uses: actions/upload-artifact@v4
with:
name: benchmark-chart
path: bench.pngThen link to the artifact in the comment (artifacts are accessible via GitHub UI but not embeddable).
Option 2: Commit and push the image to a dedicated branch
- name: Upload PNG to GitHub
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git checkout --orphan benchmark-images
git rm -rf .
cp bench.png benchmark-${{ github.run_id }}.png
git add benchmark-${{ github.run_id }}.png
git commit -m "Add benchmark image for run ${{ github.run_id }}"
git push origin benchmark-images --forceThen use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png
Option 3: Use external image hosting service or GitHub issue attachments API
🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml lines 83-111: the workflow currently
creates a blob and tree but never makes a commit or updates a ref, then builds a
raw.githubusercontent URL using a tree SHA (which will 404). Replace this with
one of the suggested approaches: either (A) upload bench.png as a workflow
artifact using actions/upload-artifact@v4 and reference the artifact in the
comment, or (B) commit the PNG to a dedicated branch (configure git user, create
or checkout a branch, add bench_<runId>.png, commit and push) and then construct
the raw.githubusercontent.com URL using the branch name, or (C) upload to an
external image host or use the GitHub Issue attachments API; implement the
chosen approach and remove the orphaned blob/tree creation and the invalid raw
URL generation.
| if globals().get("__name__") == "__main__": | ||
| tilelang.tools.bench.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use idiomatic Python for main guard.
The expression globals().get("__name__") == "__main__" is unnecessarily verbose and non-standard. The conventional Python idiom is if __name__ == "__main__":, which is more readable and widely recognized.
Apply this change:
-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
tilelang.tools.bench.main()Note: This same pattern appears in all benchmark wrapper files in this PR and should be fixed consistently across:
examples/elementwise/bench_example_elementwise.py(line 9)examples/seer_attention/bench_block_sparse_attn_tilelang.py(line 9)examples/sparse_tensorcore/bench_example_sparse_tensorcore.py(line 10)examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py(line 9)examples/dynamic_shape/bench_example_dynamic.py(line 9)examples/deepseek_nsa/bench_example_tilelang_nsa.py(line 14)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if globals().get("__name__") == "__main__": | |
| tilelang.tools.bench.main() | |
| if __name__ == "__main__": | |
| tilelang.tools.bench.main() |
🤖 Prompt for AI Agents
In examples/cast/bench_example_cast.py around lines 20-21, replace the
non-idiomatic main guard `globals().get("__name__") == "__main__"` with the
conventional Python idiom `if __name__ == "__main__":` to improve readability;
apply the same change consistently to the other benchmark wrapper files listed
in the comment (examples/elementwise/bench_example_elementwise.py line 9,
examples/seer_attention/bench_block_sparse_attn_tilelang.py line 9,
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py line 10,
examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py line 9,
examples/dynamic_shape/bench_example_dynamic.py line 9,
examples/deepseek_nsa/bench_example_tilelang_nsa.py line 14).
| if globals().get("__name__") == "__main__": | ||
| tilelang.tools.bench.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the standard Python idiom for the main guard.
The pattern if globals().get("__name__") == "__main__": is non-idiomatic. The standard Python idiom is clearer and more widely recognized.
Apply this diff to use the standard pattern:
-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
tilelang.tools.bench.main()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if globals().get("__name__") == "__main__": | |
| tilelang.tools.bench.main() | |
| if __name__ == "__main__": | |
| tilelang.tools.bench.main() |
🤖 Prompt for AI Agents
In examples/flash_attention/bench_example_flash_attention.py around lines 107 to
108, replace the non-idiomatic main guard using globals().get("__name__") with
the standard Python idiom: use if __name__ == "__main__": and call
tilelang.tools.bench.main() inside that block so the script behaves as expected
when run as a script.
| if globals().get("__name__") == "__main__": | ||
| tilelang.tools.bench.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the standard Python idiom for the main guard.
The pattern if globals().get("__name__") == "__main__": is non-idiomatic. The standard Python idiom is clearer and more widely recognized.
Apply this diff to use the standard pattern:
-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
tilelang.tools.bench.main()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if globals().get("__name__") == "__main__": | |
| tilelang.tools.bench.main() | |
| if __name__ == "__main__": | |
| tilelang.tools.bench.main() |
🤖 Prompt for AI Agents
In examples/gemm/bench_example_gemm.py around lines 25-26, the main guard uses a
non-idiomatic check (globals().get("__name__") == "__main__"); replace it with
the standard Python idiom if __name__ == "__main__": and call
tilelang.tools.bench.main() inside that block so the script executes correctly
when run as a program and remains import-safe.
| m = re.match(r"\|\s*([^\|]+)\s*\|\s*([0-9\.]+)\s*\|", line) | ||
| if m is not None: | ||
| data[m.group(1)] = float(m.group(2)) | ||
| return data | ||
|
|
||
|
|
||
| output_v1 = subprocess.run(['./tl/bin/python', './maint/scripts/performance.py'], | ||
| capture_output=True, | ||
| text=True, | ||
| env=env).stdout | ||
| data_v1 = parse_output(output_v1) | ||
| output_v1 = subprocess.run( | ||
| ['./tl/bin/python', '-c', 'import tilelang.tools.bench as b; b.bench_all()'], | ||
| capture_output=True, | ||
| text=True).stdout | ||
| output_v2 = subprocess.run( | ||
| ['./tll/bin/python', '-c', 'import tilelang.tools.bench as b; b.bench_all()'], | ||
| capture_output=True, | ||
| text=True).stdout | ||
|
|
||
| output_v2 = subprocess.run(['./tll/bin/python', './maint/scripts/performance.py'], | ||
| capture_output=True, | ||
| text=True, | ||
| env=env).stdout | ||
| data_v1 = parse_output(output_v1) | ||
| data_v2 = parse_output(output_v2) | ||
| table = [] | ||
| for key in data_v1.keys(): | ||
| speedup = data_v1[key] / data_v2[key] | ||
| table.append([key, data_v1[key], data_v2[key], speedup]) | ||
| table.sort(key=lambda x: x[-1]) | ||
|
|
||
| headers = ["File", "Original Latency", "Current Latency", "Speedup"] | ||
|
|
||
| with open("bench.md", "w") as f: | ||
| f.write( | ||
| tabulate(table, headers=headers, tablefmt="github", stralign="left", numalign="decimal")) | ||
| f.write("\n") | ||
|
|
||
| table = [[ | ||
| "original", data_v1['latency'], data_v1['best_tflops'], data_v1['ref_tflops'], data_v1['config'] | ||
| ], [ | ||
| "current", data_v2['latency'], data_v2['best_tflops'], data_v2['ref_tflops'], data_v2['config'] | ||
| ]] | ||
| df = pd.DataFrame(table, columns=headers) | ||
| df = df.sort_values("Speedup", ascending=False).reset_index(drop=True) | ||
| fig_width = max(0, len(df) * 0.35) | ||
| plt.figure(figsize=(fig_width, 8)) | ||
| sns.set_theme(style="whitegrid", font_scale=0.9) | ||
| bar_colors = sns.color_palette("magma", len(df)) | ||
| bars = plt.bar(range(len(df)), df["Speedup"], color=bar_colors, edgecolor="black") | ||
| top3_idx = df.nlargest(3, "Speedup").index | ||
| bot3_idx = df.nsmallest(3, "Speedup").index | ||
| label_idx = set(top3_idx.tolist() + bot3_idx.tolist()) | ||
|
|
||
| headers = ["version", "Best Latency (s)", "Best TFlops", "Reference TFlops", "Best Config"] | ||
| for i, val in enumerate(df["Speedup"]): | ||
| if i in label_idx: | ||
| plt.text( | ||
| i, | ||
| val + 0.02, | ||
| f"{val:.2f}x", | ||
| ha="center", | ||
| va="bottom", | ||
| color="red", | ||
| fontsize=8, | ||
| fontweight="bold") | ||
|
|
||
| print(tabulate(table, headers=headers, tablefmt="github", stralign="left", numalign="decimal")) | ||
| plt.xticks(range(len(df)), df["File"], rotation=70, ha='right', fontsize=12) | ||
| plt.ylabel("Current Speedup vs Original", fontsize=14) | ||
| plt.title("Current Speedup vs Original", fontsize=14, fontweight="bold") | ||
| plt.ylim(0, max(df["Speedup"]) * 1.2) | ||
| sns.despine() | ||
| plt.tight_layout() | ||
| plt.savefig("bench.png", dpi=300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden performance script against mismatched/empty benchmark results
The parsing/aggregation logic assumes “happy path” data and can break in real runs:
- Key mismatch risk (Lines 31–36):
for key in data_v1.keys(): ... data_v2[key]will raiseKeyErrorifbench_all()in the second environment doesn’t emit a row for some benchmark present in the first (or vice versa). This is very plausible as benchmarks are added/removed over time. - Empty-data risk (Lines 46–52, 72): if both runs fail to produce any valid records (or none overlap),
tableis empty;pd.DataFrame(table, ...)yields an empty DF andmax(df["Speedup"])will raise (max()of empty sequence). The log you committed already shows several “failed in all repeats (no valid run)” cases, so this is not theoretical. - Div-by-zero edge case: If a latency ever ends up reported as
0.0for the “current” version,speedup = data_v1[key] / data_v2[key]will crash. Unlikely, but cheap to guard.
To make CI more robust, I recommend building the table only over the intersection of keys and short‑circuiting when there is no usable data. For example:
data_v1 = parse_output(output_v1)
data_v2 = parse_output(output_v2)
-table = []
-for key in data_v1.keys():
- speedup = data_v1[key] / data_v2[key]
- table.append([key, data_v1[key], data_v2[key], speedup])
-table.sort(key=lambda x: x[-1])
+table = []
+common_keys = sorted(set(data_v1.keys()) & set(data_v2.keys()))
+for key in common_keys:
+ v1 = data_v1[key]
+ v2 = data_v2[key]
+ # Skip pathological cases to avoid crashes
+ if v2 == 0:
+ continue
+ speedup = v1 / v2
+ table.append([key, v1, v2, speedup])
+
+if not table:
+ # No overlapping, usable benchmarks — avoid generating empty reports
+ with open("bench.md", "w") as f:
+ f.write("No overlapping benchmark results found between original and current runs.\n")
+ # Skip plot generation
+ raise SystemExit(0)
+
+table.sort(key=lambda x: x[-1])
@@
-headers = ["File", "Original Latency", "Current Latency", "Speedup"]
+headers = ["File", "Original Latency", "Current Latency", "Speedup"]
@@
-df = pd.DataFrame(table, columns=headers)
-df = df.sort_values("Speedup", ascending=False).reset_index(drop=True)
-fig_width = max(0, len(df) * 0.35)
+df = pd.DataFrame(table, columns=headers).sort_values(
+ "Speedup", ascending=False
+).reset_index(drop=True)
+fig_width = max(6, len(df) * 0.35)This keeps the existing behavior for valid data, but prevents CI from failing purely due to missing/mismatched benchmarks or fully failed runs, and ensures the plot’s figure width is always positive.
🤖 Prompt for AI Agents
In maint/scripts/ci_performance.py around lines 16-75, the script assumes
matching, non-empty benchmark outputs and may KeyError on missing keys,
ZeroDivisionError if current latency is 0.0, and will crash when table/df is
empty; fix by iterating only over the intersection of data_v1 and data_v2 keys
(or explicitly check existence before accessing), skip any entry where
data_v2[key] is zero (or treat as NaN/inf) to avoid division by zero, and after
building the table short-circuit: if table is empty write a minimal bench.md (or
a clear message) and skip plotting (or create a default empty DataFrame/plot
with fixed minimum fig_width) so subsequent pandas/plt operations (like max on
Speedup) cannot raise.
| try: | ||
| with suppress_output(): | ||
| for _ in range(warmup): | ||
| func(*args, **kwargs) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent warmup failures may hide configuration issues.
Warmup exceptions are completely suppressed (lines 39-40), which could mask genuine configuration problems, import errors, or invalid parameters. A benchmark that fails during warmup will proceed to timing runs, potentially wasting time or producing misleading results.
Consider logging warmup failures:
try:
with suppress_output():
for _ in range(warmup):
func(*args, **kwargs)
- except Exception:
- pass
+ except Exception as e:
+ warnings.warn(
+ f"warmup for {func.__module__} raised {type(e).__name__}: {e}",
+ RuntimeWarning,
+ stacklevel=2,
+ )Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.5)
39-40: try-except-pass detected, consider logging the exception
(S110)
39-39: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 35 to 40, the warmup loop currently
swallows all exceptions (except Exception: pass). Replace this with explicit
exception handling that captures the exception as a variable, logs the failure
with full traceback (e.g., logger.exception or print to stderr) including
context (which function and warmup iteration), and then aborts the benchmark
(re-raise the exception or return/exit early) so timing runs do not proceed on a
bad configuration.
| else: | ||
| warnings.warn( | ||
| f"benchmark for {func.__module__} failed in all repeats (no valid run)", | ||
| RuntimeWarning, | ||
| stacklevel=2, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No record appended when all runs fail.
When times is empty (all runs failed), a warning is issued but no record is appended to _RECORDS. This causes the benchmark to be silently omitted from the final table and chart, which might confuse users expecting to see all attempted benchmarks.
Consider appending a sentinel record:
else:
warnings.warn(
f"benchmark for {func.__module__} failed in all repeats (no valid run)",
RuntimeWarning,
stacklevel=2,
)
+ _RECORDS.append((f"{func.__module__}", float('inf')))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| warnings.warn( | |
| f"benchmark for {func.__module__} failed in all repeats (no valid run)", | |
| RuntimeWarning, | |
| stacklevel=2, | |
| ) | |
| else: | |
| warnings.warn( | |
| f"benchmark for {func.__module__} failed in all repeats (no valid run)", | |
| RuntimeWarning, | |
| stacklevel=2, | |
| ) | |
| _RECORDS.append((f"{func.__module__}", float('inf'))) |
🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 66 to 71, when times is empty the code
only emits a warning and does not append any record to _RECORDS, causing the
benchmark to be omitted from outputs; update the branch so that after issuing
the warning you append a sentinel record to _RECORDS containing the benchmark
identity (func.__module__, func.__name__), count=0, mean and std as float("nan")
(or None if existing records expect None), times as an empty list, success=False
(or an equivalent flag used elsewhere), and any relevant metadata (e.g. repeats
and label) so the final table/chart includes an entry indicating the benchmark
failed all runs.
| plt.figure(figsize=(max(len(names) * 2.2, 6), 6)) | ||
| plt.bar(names, lats) | ||
| plt.xlabel("Latency (ms)") | ||
| plt.title("Benchmark Results") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chart axis labels are swapped.
Line 85 sets xlabel to "Latency (ms)", but the X-axis displays function names. The latency values are on the Y-axis. This mislabeling will confuse readers of the generated charts.
Apply this fix:
plt.figure(figsize=(max(len(names) * 2.2, 6), 6))
plt.bar(names, lats)
- plt.xlabel("Latency (ms)")
+ plt.xlabel("Functions")
+ plt.ylabel("Latency (ms)")
plt.title("Benchmark Results")🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 83 to 86, the axis labels are reversed:
the X-axis shows function names while the Y-axis shows latency, yet xlabel is
set to "Latency (ms)". Change the labels so the X-axis is labeled (e.g.)
"Function" (or "Name") and the Y-axis is labeled "Latency (ms)"; keep the bar
plot and sizing unchanged.
| def main(): | ||
| # Entry point — automatically run all bench_* functions in caller file. | ||
| test_file = inspect.getsourcefile(sys._getframe(1)) | ||
| out_dir = os.path.dirname(test_file) | ||
| module = {} | ||
| with open(test_file) as f: | ||
| exec(f.read(), module) | ||
|
|
||
| for name, func in module.items(): | ||
| if name.startswith("bench_") and callable(func): | ||
| func() | ||
|
|
||
| analyze_records(_RECORDS, out_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security risk: exec() evaluates arbitrary code from file.
Line 102 uses exec() to dynamically execute the caller's source file. This is a significant security risk if the benchmarking tool is ever run on untrusted or user-supplied files. Additionally, exec() bypasses normal import machinery, which can cause issues with relative imports, __file__, and module metadata.
Consider using proper module imports instead:
def main():
- # Entry point — automatically run all bench_* functions in caller file.
- test_file = inspect.getsourcefile(sys._getframe(1))
- out_dir = os.path.dirname(test_file)
- module = {}
- with open(test_file) as f:
- exec(f.read(), module)
-
- for name, func in module.items():
+ # Entry point — automatically run all bench_* functions in caller module.
+ caller_frame = sys._getframe(1)
+ caller_module = inspect.getmodule(caller_frame)
+ test_file = inspect.getsourcefile(caller_frame)
+ out_dir = os.path.dirname(test_file)
+
+ for name in dir(caller_module):
if name.startswith("bench_") and callable(func):
+ func = getattr(caller_module, name)
func()
analyze_records(_RECORDS, out_dir)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.5)
102-102: Use of exec detected
(S102)
🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 96 to 108, replace the unsafe exec()
usage that reads and executes the caller source with a proper module import via
importlib: create a module spec from the test file path
(importlib.util.spec_from_file_location), create a module from that spec
(importlib.util.module_from_spec), execute the module via
spec.loader.exec_module(module) so the module has correct metadata (__file__,
package, etc.), optionally insert it into sys.modules to support relative
imports, then iterate module attributes to call bench_* functions and continue
to call analyze_records(_RECORDS, out_dir); this removes arbitrary-code-exec
risk and preserves normal import behavior.
| for root, _, files in os.walk(examples_root): | ||
| for file_name in files: | ||
| if re.match(r"^bench_.*\.py$", file_name): | ||
| full_path = os.path.join(root, file_name) | ||
| if root not in added_roots: | ||
| sys.path.insert(0, root) | ||
| added_roots.add(root) | ||
| mod = _load_module(full_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying sys.path during iteration may cause import conflicts.
Lines 132-134 insert directories into sys.path while walking the examples tree. If two example directories contain modules with the same name, the first one will shadow the second, leading to incorrect benchmark execution.
Consider isolating each module's sys.path modification:
for root, _, files in os.walk(examples_root):
for file_name in files:
if re.match(r"^bench_.*\.py$", file_name):
full_path = os.path.join(root, file_name)
- if root not in added_roots:
- sys.path.insert(0, root)
- added_roots.add(root)
+ # Temporarily add directory for this specific import
+ sys.path.insert(0, root)
mod = _load_module(full_path)
+ sys.path.pop(0)
for name in dir(mod):🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 128 to 135, the code mutates sys.path
while walking examples which can cause module shadowing if different directories
contain identically named modules; instead, load each bench_*.py file without
permanently altering sys.path by using importlib.util.spec_from_file_location
(or runpy.run_path) to create a module from the file path and execute it with a
unique module name, or if you must modify sys.path temporarily, push the
directory, import, then immediately pop it (ensure uniqueness of module names)
so the global sys.path is not left modified and modules from different example
roots cannot shadow one another.
|
/perf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/pr-perfbench-bot.yml (1)
87-115: Fix the PNG upload approach (unresolved from previous review).This implementation was flagged as critical in the previous review and remains unfixed. The workflow creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. Raw URLs require a commit SHA, branch name, or tag—not a tree SHA—so this will return 404 and break the image embedding.
The following approaches were recommended previously:
Option 1 (recommended): Upload as workflow artifact
- name: Upload benchmark chart uses: actions/upload-artifact@v4 with: name: benchmark-chart path: bench.pngOption 2: Commit and push image to a dedicated branch
- name: Upload PNG to GitHub run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git checkout --orphan benchmark-images git rm -rf . cp bench.png benchmark-${{ github.run_id }}.png git add benchmark-${{ github.run_id }}.png git commit -m "Add benchmark image for run ${{ github.run_id }}" git push origin benchmark-images --forceThen use:
https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.pngChoose and implement one approach to resolve this blocker before merging.
🧹 Nitpick comments (1)
.github/workflows/pr-perfbench-bot.yml (1)
76-80: Benchmark execution depends on ci_performance.py outputs.The workflow assumes
maint/scripts/ci_performance.pywill producebench.mdandbench.pngfiles. Ensure the script reliably generates these artifacts; consider adding error handling or validation if they're missing.Would you like me to verify that
ci_performance.pyproduces the expected outputs?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr-perfbench-bot.yml(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/pr-perfbench-bot.yml
🔇 Additional comments (3)
.github/workflows/pr-perfbench-bot.yml (3)
8-9: Verify cache permissions for self-hosted runner.The workflow specifies only
contents: writebut usesactions/setup-python@v6withcache: pipenabled. Per the learning from previous workflows, caching requiresactions: writepermission to save caches. Verify whether this impacts cache behavior on the self-hosted runner, and addactions: writeif needed.
52-60: Venv setup for PR version looks solid.The explicit disabling of pip's user mode and clear isolation of the venv is well-handled.
62-74: Parallel venv setup for main branch is clean.The git state cleanup and separate venv creation are well-structured.
|
/perf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/pr-perfbench-bot.yml (1)
89-117:⚠️ Image URL generation will fail with 404 errors.The blob/tree approach creates orphaned Git objects that are not reachable from any ref. Raw GitHub URLs require a commit SHA, branch, or tag—not a tree SHA. The URL generated at line 116 will consistently return 404.
Consider one of these alternatives:
Option 1: Commit and push to a dedicated branch (recommended)
- name: Upload PNG to GitHub run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git checkout --orphan benchmark-images 2>/dev/null || git checkout benchmark-images git rm -rf . 2>/dev/null || true cp bench.png benchmark-${{ github.run_id }}.png git add benchmark-${{ github.run_id }}.png git commit -m "Add benchmark image for run ${{ github.run_id }}" git push origin benchmark-images --forceThen use:
https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.pngOption 2: Upload as a workflow artifact
- name: Upload benchmark chart uses: actions/upload-artifact@v4 with: name: benchmark-chart-${{ github.run_id }} path: bench.png(Note: artifact URLs are less convenient for embedding, but are guaranteed to work.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr-perfbench-bot.yml(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/pr-perfbench-bot.yml
🔇 Additional comments (1)
.github/workflows/pr-perfbench-bot.yml (1)
56-61: Pip configuration cleanup is correct.The explicit cleanup of pip configuration (PIP_CONFIG_FILE, PYTHONUSERBASE) and the
--no-userflags ensure a clean, reproducible environment for both the merged and original installs.
| permissions: | ||
| contents: read | ||
| contents: write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Add actions: write permission for cache operations.
The current permissions block only specifies contents: write. When a permissions block is defined, any unlisted permission (including actions) defaults to none. This prevents setup-python's cache from being saved—it will only restore from cache but never write new entries.
Apply this diff:
permissions:
contents: write
+ actions: writeBased on learnings: In GitHub Actions workflows with a permissions block, setup-python's cache feature requires actions: write to save caches; without it, caches restore but never persist.
I'll verify this review comment by checking the codebase and GitHub Actions documentation.
🏁 Script executed:
#!/bin/bash
# Check if the workflow file exists and examine permissions block and setup-python usage
cat .github/workflows/pr-perfbench-bot.yml 2>/dev/null || echo "File not found"Length of output: 4705
🌐 Web query:
GitHub Actions setup-python cache requires actions write permission
💡 Result:
Short answer: actions/cache (used by setup-python and many wrapper actions) needs the workflow's GITHUB_TOKEN to have actions: write when it needs to create/delete caches. Add permissions to the workflow or provide a PAT (for fork PRs where GITHUB_TOKEN is limited).
What to do
- In your workflow or job, grant the permission:
permissions:
actions: write
contents: read - Example (top of workflow):
permissions:
actions: write
contents: read
Notes
- For pull requests from forks the GITHUB_TOKEN is read-only for some permissions; to allow cache writes/deletes for fork PRs you must use a personal access token (repo scope) stored in a secret and pass it to the cache action (or disable deletion). [1][2][3]
Sources
- cache action / examples requiring actions: write. [1]
- REST / API note: deleting caches requires an auth token with repo scope or GitHub App actions: write. [2]
- Marketplace/README examples calling out actions: write for cache deletion. [3]
Add actions: write permission for cache operations.
The workflow uses actions/setup-python@v6 with cache: pip enabled. When a permissions block is defined in GitHub Actions, the actions/cache (used by setup-python) requires actions: write permission to save caches. Without this permission, caches will restore but not persist across runs, degrading workflow performance.
Apply this diff:
permissions:
contents: write
+ actions: write🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml around lines 8 to 9 currently grants only
contents: write; add an actions: write permission to the permissions block so
the actions/cache used by actions/setup-python (cache: pip) can persist
caches—update the permissions to include actions: write alongside contents:
write (keep existing permissions and only add actions: write).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
.github/workflows/pr-perfbench-bot.yml (2)
8-9: Addactions: writepermission for cache operations.The workflow uses
actions/setup-python@v6withcache: pipenabled (line 47). The actions/cache used by setup-python requiresactions: writepermission to save caches. Without this, caches restore but never persist across runs, degrading workflow performance.Apply this diff to fix:
permissions: contents: write + actions: writeBased on learnings: In GitHub Actions workflows with a permissions block, setup-python's cache feature requires
actions: writeto save caches; without it, caches restore but never persist.
89-117: Fix the PNG upload and URL generation approach.The current implementation creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. This will not work—raw.githubusercontent.com URLs require a commit SHA, branch name, or tag, not a tree SHA. The URL will return 404.
Consider these alternatives:
Option 1: Upload as workflow artifact (simplest)
- name: Upload benchmark chart uses: actions/upload-artifact@v4 with: name: benchmark-chart-${{ github.run_id }} path: bench.pngThen reference the artifact in the comment with a note that it's available in the "Artifacts" section of the workflow run.
Option 2: Commit and push the image to a dedicated branch (allows embedding)
- name: Upload PNG to GitHub run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git checkout --orphan benchmark-images git rm -rf . cp bench.png benchmark-${{ github.run_id }}.png git add benchmark-${{ github.run_id }}.png git commit -m "Add benchmark image for run ${{ github.run_id }}" git push origin benchmark-images --forceThen use:
https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.pngOption 3: Use GitHub Issues API attachments
Upload via the issue attachments API endpoint (if available in your GitHub version).Replace lines 89–117 with your chosen approach.
🧹 Nitpick comments (2)
.github/workflows/pr-perfbench-bot.yml (2)
56-76: Minor inconsistency: Error suppression differs between venv setups.The merged version (lines 58–59) doesn't suppress errors when unsetting pip config keys, but the original version (lines 73–74) uses
|| true. If config keys don't exist, the first will fail while the second continues silently. For consistency and robustness, align both to suppress non-fatal errors.Apply this diff:
export PIP_CONFIG_FILE=/dev/null export PYTHONUSERBASE="" - pip config unset global.user - pip config unset user.user + pip config unset global.user || true + pip config unset user.user || true pip install --no-user -r requirements-test.txt
78-88: Verify file generation before downstream steps.The workflow runs
ci_performance.py(line 82) and assumesbench.mdandbench.pngare generated. Consider adding validation to ensure these files exist before attempting to read them.Apply this diff:
- name: Run performance test id: perfbench run: | source tl/bin/activate python maint/scripts/ci_performance.py + if [[ ! -f bench.md ]] || [[ ! -f bench.png ]]; then + echo "Error: Performance test did not generate required output files" + exit 1 + fi - name: Read markdown tableThis ensures that if the performance script fails to generate outputs, the workflow fails fast with a clear message.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr-perfbench-bot.yml(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/pr-perfbench-bot.yml
Summary by CodeRabbit