Skip to content

Conversation

@yyttt6
Copy link
Contributor

@yyttt6 yyttt6 commented Nov 19, 2025

Summary by CodeRabbit

  • New Features
    • End-to-end benchmarking harness added with repeatable measurements and aggregated results.
    • Bench runs now produce a Markdown summary and a saved speedup plot; CI uploads and exposes these artifacts.
    • PR performance comments now include the formatted results table and embedded visualization.
    • Many new benchmark entry points added across examples (kernels, attention, quantization, sparse ops, inference).

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Walkthrough

Adds a benchmarking utility (tilelang.tools.bench), ~30 new example bench wrapper scripts, a CI performance aggregation/plotting script, updates the perfbench GitHub Actions workflow to publish comment + image, and includes a runtime log file from benchmark runs.

Changes

Cohort / File(s) Summary
Benchmark core
\tilelang/tools/bench.py``
New module providing process_func(), main(), bench_all(), analyze_records(), suppress_output context manager, a shared _RECORDS and __all__ = ["main", "process_func"] to run, collect, and visualize per-function benchmark latencies.
Example bench wrappers
Multiple files
\examples//bench_.py`(≈30 files, e.g.`examples/attention_sink/bench_example_attention_sink.py`, `examples/flash_attention/bench_example_flash_attention.py`, `examples/gemm/bench_example_gemm.py`, `examples/topk/bench_topk_tilelang.py``, etc.)
Many new small scripts. Each defines one or more bench_*() functions that call tilelang.tools.bench.process_func() on example module main() functions and include a __main__ guard to call tilelang.tools.bench.main().
CI workflow
\.github/workflows/pr-perfbench-bot.yml``
Workflow permission change (contents: write), modified pip install steps to use no-user installs, and added steps to read bench.md, upload bench.png as a blob, derive a raw URL, and expose it as an output for PR comment construction.
CI performance script
\maint/scripts/ci_performance.py``
New/rewritten script that runs benchmarks, parses markdown table rows into a pandas DataFrame, computes per-file speedups, generates a seaborn/matplotlib bar chart (bench.png), writes bench.md, and prints/returns outputs for the workflow.
Runtime logs
\log.txt``
Added benchmark run logs showing repeated runtime failures (TileLang builder while-loop error), CUDA OOMs, and multiple benchmark tracebacks (diagnostic output only).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Bench as tilelang.tools.bench
    participant Example as Example Module
    participant Store as _RECORDS / bench.png

    User->>Bench: main()
    Bench->>Bench: discover bench_* functions
    loop per bench_*
        Bench->>Example: process_func(target_func) (warmup, suppressed output)
        loop repeats
            Bench->>Example: call target_func (timed)
            Example-->>Bench: result or exception
            Bench->>Store: record latency or failure
        end
    end
    Bench->>Bench: analyze_records(out_dir)
    Bench->>Store: write bench.md and bench.png
    Bench-->>User: print results and saved image path
Loading
sequenceDiagram
    participant GHA as GitHub Actions
    participant Workflow as pr-perfbench-bot.yml
    participant CI as maint/scripts/ci_performance.py
    participant BenchTool as tilelang.tools.bench
    participant GHApi as GitHub API (blob/contents)
    participant PR as PR Comment

    GHA->>Workflow: run perf workflow
    Workflow->>CI: run ci_performance.py
    CI->>BenchTool: start benchmarks
    BenchTool-->>CI: bench.md + bench.png
    CI->>Workflow: set outputs (bench.md, bench.png path)
    Workflow->>GHApi: upload bench.png as blob -> get raw URL
    Workflow->>PR: post comment body with bench.md and embedded image URL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Files needing extra attention:
    • tilelang/tools/bench.py: timing logic, stdout/stderr suppression, per-run exception handling, global _RECORDS lifecycle and interaction when called from multiple modules.
    • maint/scripts/ci_performance.py: Markdown parsing robustness, numeric conversions, speedup calculations, plotting edge cases and file I/O.
    • .github/workflows/pr-perfbench-bot.yml: permission change and blob upload/raw URL logic.
    • A representative sample of examples/*/bench_*.py to ensure no heavy work runs at import time.

Possibly related PRs

Suggested reviewers

  • LeiWang1999
  • tzj-fxz

Poem

🐇 Hop, hop — the benches start to run,
Tiny scripts beneath the sun,
Tables tally, charts take flight,
Latencies dance in plotted light.
A rabbit cheers — benchmarks done!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Feature]: Add benchmark scripts for examples' directly and clearly summarizes the main change: adding benchmark scripts across multiple example directories.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yyttt6
Copy link
Contributor Author

yyttt6 commented Nov 19, 2025

/perf

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (19)
examples/deepseek_mla/bench_example_mla_decode.py (1)

9-10: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1)

9-10: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1)

9-10: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/gemv/bench_example_gemv.py (1)

9-10: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/analyze/bench_example_analyze.py (1)

14-15: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/gemm_splitk/bench_example_gemm_splitk.py (1)

14-15: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (1)

63-64: Simplify the name check.

The pattern if globals().get("__name__") == "__main__": is unconventional. Use the standard idiom instead.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
tilelang/tools/bench.py (1)

51-53: Bare exception catch loses error context.

Catching Exception without inspecting the exception type can mask important failure modes (e.g., KeyboardInterrupt inheritance issues, resource exhaustion). The traceback is printed but the exception type/message is not recorded in warnings.

Enhance error reporting:

         except Exception:
             fail_count += 1
-            traceback.print_exc(file=sys.stderr)
+            exc_info = sys.exc_info()
+            traceback.print_exception(*exc_info, file=sys.stderr)
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)

2-2: Remove unused import.

The tilelang import on line 2 is not used anywhere in this file. Removing unused imports improves code clarity and may slightly reduce import time.

 import tilelang.tools.bench
-import tilelang
 import tilelang_example_sparse_tensorcore
examples/deepseek_nsa/bench_example_tilelang_nsa.py (1)

10-11: Minor naming inconsistency.

The function name bench_example_tilelang_nsa_fwd_decode includes "fwd_" prefix, but the imported module is example_tilelang_nsa_decode (without "fwd"). While this works correctly, the inconsistency might cause confusion. Consider renaming to bench_example_tilelang_nsa_decode for consistency.

-def bench_example_tilelang_nsa_fwd_decode():
+def bench_example_tilelang_nsa_decode():
     tilelang.tools.bench.process_func(example_tilelang_nsa_decode.main)
examples/minference/bench_vs_sparse_attn.py (1)

9-10: Simplify the main guard to use idiomatic Python.

The pattern globals().get("__name__") is unnecessarily complex. The standard idiom __name__ directly accesses the built-in module name without indirection.

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/linear_attention/bench_linear_attn.py (1)

14-15: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/flash_decoding/bench_example_flash_decoding.py (1)

15-16: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/fusedmoe/bench_example_fusedmoe.py (1)

17-18: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/convolution/bench_example_convolution.py (1)

14-15: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/warp_specialize/bench_example_warp_specialize.py (1)

28-29: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/blocksparse_attention/bench_example_blocksparse_attention.py (1)

54-55: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
examples/dequantize_gemm/bench_example_dequantize_gemm.py (1)

34-35: Simplify the main guard to use idiomatic Python.

Use the standard __name__ idiom instead of accessing through globals().

Apply this diff:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
log.txt (1)

1-476: Avoid committing full benchmark runtime logs to the repo

This file is a raw, environment-specific run log (warnings, stack traces, CUDA OOMs, etc.). Keeping it version-controlled will create noisy diffs and quickly become stale. It’s better treated as a CI artifact or local debug output rather than source.

I suggest either:

  • Removing log.txt from the repo and adding it to .gitignore, or
  • Replacing it with a short, curated example log snippet in documentation if you need to illustrate expected failures.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49f3539 and 58e7ffe.

📒 Files selected for processing (31)
  • .github/workflows/pr-perfbench-bot.yml (3 hunks)
  • examples/analyze/bench_example_analyze.py (1 hunks)
  • examples/attention_sink/bench_example_attention_sink.py (1 hunks)
  • examples/blocksparse_attention/bench_example_blocksparse_attention.py (1 hunks)
  • examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1 hunks)
  • examples/cast/bench_example_cast.py (1 hunks)
  • examples/convolution/bench_example_convolution.py (1 hunks)
  • examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1 hunks)
  • examples/deepseek_mla/bench_example_mla_decode.py (1 hunks)
  • examples/deepseek_nsa/bench_example_tilelang_nsa.py (1 hunks)
  • examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (1 hunks)
  • examples/dequantize_gemm/bench_example_dequantize_gemm.py (1 hunks)
  • examples/dynamic_shape/bench_example_dynamic.py (1 hunks)
  • examples/elementwise/bench_example_elementwise.py (1 hunks)
  • examples/flash_attention/bench_example_flash_attention.py (1 hunks)
  • examples/flash_decoding/bench_example_flash_decoding.py (1 hunks)
  • examples/fusedmoe/bench_example_fusedmoe.py (1 hunks)
  • examples/gemm/bench_example_gemm.py (1 hunks)
  • examples/gemm_fp8/bench_example_gemm_fp8.py (1 hunks)
  • examples/gemm_splitk/bench_example_gemm_splitk.py (1 hunks)
  • examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (1 hunks)
  • examples/gemv/bench_example_gemv.py (1 hunks)
  • examples/linear_attention/bench_linear_attn.py (1 hunks)
  • examples/minference/bench_vs_sparse_attn.py (1 hunks)
  • examples/seer_attention/bench_block_sparse_attn_tilelang.py (1 hunks)
  • examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1 hunks)
  • examples/topk/bench_topk_tilelang.py (1 hunks)
  • examples/warp_specialize/bench_example_warp_specialize.py (1 hunks)
  • log.txt (1 hunks)
  • maint/scripts/ci_performance.py (1 hunks)
  • tilelang/tools/bench.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/pr-perfbench-bot.yml
🧬 Code graph analysis (28)
examples/gemm_splitk/bench_example_gemm_splitk.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/seer_attention/bench_block_sparse_attn_tilelang.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/gemm/bench_example_gemm.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (6)
examples/deepseek_v32/sparse_mla_bwd.py (2)
  • sparse_mla_bwd (283-320)
  • test_sparse_mla_bwd (334-384)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/deepseek_v32/topk_selector.py (1)
  • test_topk_selector (188-245)
examples/deepseek_v32/fp8_lighting_indexer.py (1)
  • test_fp8_lighting_indexer (260-304)
examples/deepseek_v32/sparse_mla_fwd.py (1)
  • test_sparse_mla_fwd (255-301)
examples/deepseek_v32/sparse_mla_fwd_pipelined.py (1)
  • test_sparse_mla_fwd_pipelined (404-456)
examples/elementwise/bench_example_elementwise.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/deepseek_mla/bench_example_mla_decode.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/dynamic_shape/bench_example_dynamic.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/deepseek_nsa/bench_example_tilelang_nsa.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/warp_specialize/bench_example_warp_specialize.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/flash_attention/bench_example_flash_attention.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/gemm_fp8/bench_example_gemm_fp8.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/fusedmoe/bench_example_fusedmoe.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/topk/bench_topk_tilelang.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/blocksparse_attention/bench_example_blocksparse_attention.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/cast/bench_example_cast.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/minference/bench_vs_sparse_attn.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/linear_attention/bench_linear_attn.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/analyze/bench_example_analyze.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
maint/scripts/ci_performance.py (2)
tilelang/env.py (1)
  • disable_cache (275-276)
maint/scripts/performance.py (1)
  • run (22-69)
examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/dequantize_gemm/bench_example_dequantize_gemm.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/flash_decoding/bench_example_flash_decoding.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/convolution/bench_example_convolution.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/gemv/bench_example_gemv.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
examples/attention_sink/bench_example_attention_sink.py (1)
tilelang/tools/bench.py (2)
  • process_func (33-71)
  • main (96-108)
🪛 Ruff (0.14.5)
tilelang/tools/bench.py

39-40: try-except-pass detected, consider logging the exception

(S110)


39-39: Do not catch blind exception: Exception

(BLE001)


51-51: Do not catch blind exception: Exception

(BLE001)


102-102: Use of exec detected

(S102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (21)
.github/workflows/pr-perfbench-bot.yml (3)

8-9: Verify that contents: write permission is necessary.

The permissions were changed from read to write. Based on learnings, actions: write is needed for cache saving with restricted permissions blocks. However, contents: write grants broader access to modify repository contents.

Verify whether contents: write is required for the blob/tree creation in the upload PNG step, or if there's an alternative approach that doesn't require write access to repository contents.


77-82: LGTM!

Reading the markdown table output and exposing it via GitHub Actions outputs is a clean approach.


124-130: Update the comment body construction after fixing PNG upload.

Once the PNG upload issue is resolved, ensure the image URL points to a valid, accessible location.

examples/cast/bench_example_cast.py (2)

6-13: LGTM - Benchmark wrapper correctly configured.

The function properly delegates to process_func with appropriate parameters for the group-per-split-token cast example. The parameter choices (M=1024, N=1024, BG=2, blk_m=4, batch_sizes=[128, 896]) appear reasonable for benchmarking.


16-17: LGTM - Benchmark wrapper correctly configured.

The function properly delegates to process_func with appropriate parameters for the per-token cast example.

examples/elementwise/bench_example_elementwise.py (1)

5-6: LGTM - Simple benchmark wrapper.

The function correctly delegates to process_func. No parameters are passed, suggesting example_elementwise_add.main either takes no arguments or has appropriate defaults.

examples/seer_attention/bench_block_sparse_attn_tilelang.py (1)

5-6: LGTM - Benchmark wrapper follows standard pattern.

The function correctly delegates to process_func for the block sparse attention benchmark.

examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)

6-7: LGTM - Benchmark wrapper correctly configured.

The function properly delegates to process_func for the sparse tensorcore example.

examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (2)

1-10: Note: AI summary inconsistency.

The AI summary mentions two bench functions (bench_example_tilelang_gemm_splitk and bench_example_tilelang_gemm_splitk_vectorize_atomicadd), but this file only contains bench_example_tilelang_gemm_streamk. The implementation itself is correct.


5-6: LGTM - Benchmark wrapper follows standard pattern.

The function correctly delegates to process_func for the GEMM stream-K example.

examples/dynamic_shape/bench_example_dynamic.py (1)

5-6: LGTM - Benchmark wrapper with appropriate parameters.

The function correctly delegates to process_func with M=N=K=1024, which are reasonable square matrix dimensions for benchmarking dynamic shape handling.

examples/deepseek_nsa/bench_example_tilelang_nsa.py (1)

6-7: LGTM - Benchmark wrapper correctly configured.

The function properly delegates to process_func for the NSA forward example.

examples/minference/bench_vs_sparse_attn.py (1)

6-6: Empty argv is appropriate for benchmark invocation

The code correctly passes argv=[] to the target function. Looking at the target function signature in examples/minference/example_vertical_slash_sparse_attn.py, the main function accepts argv=None and uses argparse.parse_args(argv).

Passing argv=[] ensures:

  • All arguments use their default values (deterministic behavior)
  • No interference from command-line arguments
  • Proper benchmarking baseline with reproducible parameters

This is idiomatic usage for benchmark tools.

examples/attention_sink/bench_example_attention_sink.py (1)

9-52: Bench wrappers are consistent with the new benchmarking harness

These bench_* functions cleanly forward to the respective example_*.main targets (with and without window_size=128) and integrate correctly with tilelang.tools.bench.main()’s auto-discovery pattern. I don’t see functional issues here; this looks ready to plug into the shared bench infra.

examples/topk/bench_topk_tilelang.py (1)

5-10: Straightforward and aligned with the common bench pattern

bench_example_topk() correctly delegates to example_topk.main, and the main guard defers to tilelang.tools.bench.main() so this file will participate in the unified benchmarking flow alongside other examples.

examples/gemm_fp8/bench_example_gemm_fp8.py (1)

7-20: GEMM FP8 variants are wired correctly into the bench harness

Each bench_example_tilelang_gemm_fp8* wrapper calls the appropriate example module’s main via tilelang.tools.bench.process_func, and the module-level __main__ guard hooks into bench.main(). This is consistent with the rest of the new bench scripts and looks good.

examples/gemm/bench_example_gemm.py (2)

1-5: LGTM!

The imports are clean and follow the expected pattern for benchmark wrapper scripts.


8-22: LGTM!

The benchmark wrapper functions correctly delegate to tilelang.tools.bench.process_func with appropriate parameters for each GEMM variant.

examples/flash_attention/bench_example_flash_attention.py (3)

1-14: LGTM!

The imports are well-organized and cover all flash attention benchmark variants.


17-26: LGTM!

The benchmark wrappers without explicit parameters correctly delegate to their respective example modules.

Also applies to: 85-90


29-104: Parameter naming is correct—resolves review concern.

The verification confirms that the parameter naming inconsistency is intentional and correct. All benchmark functions use parameter names that precisely match their corresponding example modules:

  • Backward functions correctly use uppercase (BATCH, H, N_CTX, D_HEAD) matching backward example signatures
  • Forward functions correctly use lowercase (batch, heads, seq_len, dim) matching forward example signatures

No runtime errors will occur.

Comment on lines +83 to +111
- name: Upload PNG to GitHub and get URL
id: upload_png
uses: actions/github-script@v8
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
const content = fs.readFileSync('bench.png').toString('base64');
// Create blob in the repo
const blob = await github.rest.git.createBlob({
owner: context.repo.owner,
repo: context.repo.repo,
content: content,
encoding: "base64",
});
// Attach blob as a tree item
const tree = await github.rest.git.createTree({
owner: context.repo.owner,
repo: context.repo.repo,
tree: [{
path: `bench_${context.runId}.png`,
mode: '100644',
type: 'blob',
sha: blob.data.sha
}]
});
// Raw file URL (works for embedding image)
const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png`
core.setOutput("url", url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the PNG upload and URL generation approach.

The current implementation creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. This will not work because:

  • Raw GitHub URLs require a commit SHA, branch name, or tag—not a tree SHA
  • The created blob/tree exists in Git storage but is not reachable from any ref
  • The URL https://raw.githubusercontent.com/.../[tree-sha]/... will return 404

Consider these alternatives:

Option 1: Upload as workflow artifact and use GitHub API to get URL

- name: Upload benchmark chart
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-chart
    path: bench.png

Then link to the artifact in the comment (artifacts are accessible via GitHub UI but not embeddable).

Option 2: Commit and push the image to a dedicated branch

- name: Upload PNG to GitHub
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git checkout --orphan benchmark-images
    git rm -rf .
    cp bench.png benchmark-${{ github.run_id }}.png
    git add benchmark-${{ github.run_id }}.png
    git commit -m "Add benchmark image for run ${{ github.run_id }}"
    git push origin benchmark-images --force

Then use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png

Option 3: Use external image hosting service or GitHub issue attachments API

🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml lines 83-111: the workflow currently
creates a blob and tree but never makes a commit or updates a ref, then builds a
raw.githubusercontent URL using a tree SHA (which will 404). Replace this with
one of the suggested approaches: either (A) upload bench.png as a workflow
artifact using actions/upload-artifact@v4 and reference the artifact in the
comment, or (B) commit the PNG to a dedicated branch (configure git user, create
or checkout a branch, add bench_<runId>.png, commit and push) and then construct
the raw.githubusercontent.com URL using the branch name, or (C) upload to an
external image host or use the GitHub Issue attachments API; implement the
chosen approach and remove the orphaned blob/tree creation and the invalid raw
URL generation.

Comment on lines +20 to +21
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use idiomatic Python for main guard.

The expression globals().get("__name__") == "__main__" is unnecessarily verbose and non-standard. The conventional Python idiom is if __name__ == "__main__":, which is more readable and widely recognized.

Apply this change:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()

Note: This same pattern appears in all benchmark wrapper files in this PR and should be fixed consistently across:

  • examples/elementwise/bench_example_elementwise.py (line 9)
  • examples/seer_attention/bench_block_sparse_attn_tilelang.py (line 9)
  • examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (line 10)
  • examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (line 9)
  • examples/dynamic_shape/bench_example_dynamic.py (line 9)
  • examples/deepseek_nsa/bench_example_tilelang_nsa.py (line 14)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
if __name__ == "__main__":
tilelang.tools.bench.main()
🤖 Prompt for AI Agents
In examples/cast/bench_example_cast.py around lines 20-21, replace the
non-idiomatic main guard `globals().get("__name__") == "__main__"` with the
conventional Python idiom `if __name__ == "__main__":` to improve readability;
apply the same change consistently to the other benchmark wrapper files listed
in the comment (examples/elementwise/bench_example_elementwise.py line 9,
examples/seer_attention/bench_block_sparse_attn_tilelang.py line 9,
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py line 10,
examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py line 9,
examples/dynamic_shape/bench_example_dynamic.py line 9,
examples/deepseek_nsa/bench_example_tilelang_nsa.py line 14).

Comment on lines +107 to +108
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the standard Python idiom for the main guard.

The pattern if globals().get("__name__") == "__main__": is non-idiomatic. The standard Python idiom is clearer and more widely recognized.

Apply this diff to use the standard pattern:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
if __name__ == "__main__":
tilelang.tools.bench.main()
🤖 Prompt for AI Agents
In examples/flash_attention/bench_example_flash_attention.py around lines 107 to
108, replace the non-idiomatic main guard using globals().get("__name__") with
the standard Python idiom: use if __name__ == "__main__": and call
tilelang.tools.bench.main() inside that block so the script behaves as expected
when run as a script.

Comment on lines +25 to +26
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the standard Python idiom for the main guard.

The pattern if globals().get("__name__") == "__main__": is non-idiomatic. The standard Python idiom is clearer and more widely recognized.

Apply this diff to use the standard pattern:

-if globals().get("__name__") == "__main__":
+if __name__ == "__main__":
     tilelang.tools.bench.main()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
if __name__ == "__main__":
tilelang.tools.bench.main()
🤖 Prompt for AI Agents
In examples/gemm/bench_example_gemm.py around lines 25-26, the main guard uses a
non-idiomatic check (globals().get("__name__") == "__main__"); replace it with
the standard Python idiom if __name__ == "__main__": and call
tilelang.tools.bench.main() inside that block so the script executes correctly
when run as a program and remains import-safe.

Comment on lines +16 to +75
m = re.match(r"\|\s*([^\|]+)\s*\|\s*([0-9\.]+)\s*\|", line)
if m is not None:
data[m.group(1)] = float(m.group(2))
return data


output_v1 = subprocess.run(['./tl/bin/python', './maint/scripts/performance.py'],
capture_output=True,
text=True,
env=env).stdout
data_v1 = parse_output(output_v1)
output_v1 = subprocess.run(
['./tl/bin/python', '-c', 'import tilelang.tools.bench as b; b.bench_all()'],
capture_output=True,
text=True).stdout
output_v2 = subprocess.run(
['./tll/bin/python', '-c', 'import tilelang.tools.bench as b; b.bench_all()'],
capture_output=True,
text=True).stdout

output_v2 = subprocess.run(['./tll/bin/python', './maint/scripts/performance.py'],
capture_output=True,
text=True,
env=env).stdout
data_v1 = parse_output(output_v1)
data_v2 = parse_output(output_v2)
table = []
for key in data_v1.keys():
speedup = data_v1[key] / data_v2[key]
table.append([key, data_v1[key], data_v2[key], speedup])
table.sort(key=lambda x: x[-1])

headers = ["File", "Original Latency", "Current Latency", "Speedup"]

with open("bench.md", "w") as f:
f.write(
tabulate(table, headers=headers, tablefmt="github", stralign="left", numalign="decimal"))
f.write("\n")

table = [[
"original", data_v1['latency'], data_v1['best_tflops'], data_v1['ref_tflops'], data_v1['config']
], [
"current", data_v2['latency'], data_v2['best_tflops'], data_v2['ref_tflops'], data_v2['config']
]]
df = pd.DataFrame(table, columns=headers)
df = df.sort_values("Speedup", ascending=False).reset_index(drop=True)
fig_width = max(0, len(df) * 0.35)
plt.figure(figsize=(fig_width, 8))
sns.set_theme(style="whitegrid", font_scale=0.9)
bar_colors = sns.color_palette("magma", len(df))
bars = plt.bar(range(len(df)), df["Speedup"], color=bar_colors, edgecolor="black")
top3_idx = df.nlargest(3, "Speedup").index
bot3_idx = df.nsmallest(3, "Speedup").index
label_idx = set(top3_idx.tolist() + bot3_idx.tolist())

headers = ["version", "Best Latency (s)", "Best TFlops", "Reference TFlops", "Best Config"]
for i, val in enumerate(df["Speedup"]):
if i in label_idx:
plt.text(
i,
val + 0.02,
f"{val:.2f}x",
ha="center",
va="bottom",
color="red",
fontsize=8,
fontweight="bold")

print(tabulate(table, headers=headers, tablefmt="github", stralign="left", numalign="decimal"))
plt.xticks(range(len(df)), df["File"], rotation=70, ha='right', fontsize=12)
plt.ylabel("Current Speedup vs Original", fontsize=14)
plt.title("Current Speedup vs Original", fontsize=14, fontweight="bold")
plt.ylim(0, max(df["Speedup"]) * 1.2)
sns.despine()
plt.tight_layout()
plt.savefig("bench.png", dpi=300)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden performance script against mismatched/empty benchmark results

The parsing/aggregation logic assumes “happy path” data and can break in real runs:

  • Key mismatch risk (Lines 31–36): for key in data_v1.keys(): ... data_v2[key] will raise KeyError if bench_all() in the second environment doesn’t emit a row for some benchmark present in the first (or vice versa). This is very plausible as benchmarks are added/removed over time.
  • Empty-data risk (Lines 46–52, 72): if both runs fail to produce any valid records (or none overlap), table is empty; pd.DataFrame(table, ...) yields an empty DF and max(df["Speedup"]) will raise (max() of empty sequence). The log you committed already shows several “failed in all repeats (no valid run)” cases, so this is not theoretical.
  • Div-by-zero edge case: If a latency ever ends up reported as 0.0 for the “current” version, speedup = data_v1[key] / data_v2[key] will crash. Unlikely, but cheap to guard.

To make CI more robust, I recommend building the table only over the intersection of keys and short‑circuiting when there is no usable data. For example:

 data_v1 = parse_output(output_v1)
 data_v2 = parse_output(output_v2)
-table = []
-for key in data_v1.keys():
-    speedup = data_v1[key] / data_v2[key]
-    table.append([key, data_v1[key], data_v2[key], speedup])
-table.sort(key=lambda x: x[-1])
+table = []
+common_keys = sorted(set(data_v1.keys()) & set(data_v2.keys()))
+for key in common_keys:
+    v1 = data_v1[key]
+    v2 = data_v2[key]
+    # Skip pathological cases to avoid crashes
+    if v2 == 0:
+        continue
+    speedup = v1 / v2
+    table.append([key, v1, v2, speedup])
+
+if not table:
+    # No overlapping, usable benchmarks — avoid generating empty reports
+    with open("bench.md", "w") as f:
+        f.write("No overlapping benchmark results found between original and current runs.\n")
+    # Skip plot generation
+    raise SystemExit(0)
+
+table.sort(key=lambda x: x[-1])
@@
-headers = ["File", "Original Latency", "Current Latency", "Speedup"]
+headers = ["File", "Original Latency", "Current Latency", "Speedup"]
@@
-df = pd.DataFrame(table, columns=headers)
-df = df.sort_values("Speedup", ascending=False).reset_index(drop=True)
-fig_width = max(0, len(df) * 0.35)
+df = pd.DataFrame(table, columns=headers).sort_values(
+    "Speedup", ascending=False
+).reset_index(drop=True)
+fig_width = max(6, len(df) * 0.35)

This keeps the existing behavior for valid data, but prevents CI from failing purely due to missing/mismatched benchmarks or fully failed runs, and ensures the plot’s figure width is always positive.

🤖 Prompt for AI Agents
In maint/scripts/ci_performance.py around lines 16-75, the script assumes
matching, non-empty benchmark outputs and may KeyError on missing keys,
ZeroDivisionError if current latency is 0.0, and will crash when table/df is
empty; fix by iterating only over the intersection of data_v1 and data_v2 keys
(or explicitly check existence before accessing), skip any entry where
data_v2[key] is zero (or treat as NaN/inf) to avoid division by zero, and after
building the table short-circuit: if table is empty write a minimal bench.md (or
a clear message) and skip plotting (or create a default empty DataFrame/plot
with fixed minimum fig_width) so subsequent pandas/plt operations (like max on
Speedup) cannot raise.

Comment on lines +35 to +40
try:
with suppress_output():
for _ in range(warmup):
func(*args, **kwargs)
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent warmup failures may hide configuration issues.

Warmup exceptions are completely suppressed (lines 39-40), which could mask genuine configuration problems, import errors, or invalid parameters. A benchmark that fails during warmup will proceed to timing runs, potentially wasting time or producing misleading results.

Consider logging warmup failures:

     try:
         with suppress_output():
             for _ in range(warmup):
                 func(*args, **kwargs)
-    except Exception:
-        pass
+    except Exception as e:
+        warnings.warn(
+            f"warmup for {func.__module__} raised {type(e).__name__}: {e}",
+            RuntimeWarning,
+            stacklevel=2,
+        )

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.5)

39-40: try-except-pass detected, consider logging the exception

(S110)


39-39: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 35 to 40, the warmup loop currently
swallows all exceptions (except Exception: pass). Replace this with explicit
exception handling that captures the exception as a variable, logs the failure
with full traceback (e.g., logger.exception or print to stderr) including
context (which function and warmup iteration), and then aborts the benchmark
(re-raise the exception or return/exit early) so timing runs do not proceed on a
bad configuration.

Comment on lines +66 to +71
else:
warnings.warn(
f"benchmark for {func.__module__} failed in all repeats (no valid run)",
RuntimeWarning,
stacklevel=2,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No record appended when all runs fail.

When times is empty (all runs failed), a warning is issued but no record is appended to _RECORDS. This causes the benchmark to be silently omitted from the final table and chart, which might confuse users expecting to see all attempted benchmarks.

Consider appending a sentinel record:

     else:
         warnings.warn(
             f"benchmark for {func.__module__} failed in all repeats (no valid run)",
             RuntimeWarning,
             stacklevel=2,
         )
+        _RECORDS.append((f"{func.__module__}", float('inf')))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
warnings.warn(
f"benchmark for {func.__module__} failed in all repeats (no valid run)",
RuntimeWarning,
stacklevel=2,
)
else:
warnings.warn(
f"benchmark for {func.__module__} failed in all repeats (no valid run)",
RuntimeWarning,
stacklevel=2,
)
_RECORDS.append((f"{func.__module__}", float('inf')))
🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 66 to 71, when times is empty the code
only emits a warning and does not append any record to _RECORDS, causing the
benchmark to be omitted from outputs; update the branch so that after issuing
the warning you append a sentinel record to _RECORDS containing the benchmark
identity (func.__module__, func.__name__), count=0, mean and std as float("nan")
(or None if existing records expect None), times as an empty list, success=False
(or an equivalent flag used elsewhere), and any relevant metadata (e.g. repeats
and label) so the final table/chart includes an entry indicating the benchmark
failed all runs.

Comment on lines +83 to +86
plt.figure(figsize=(max(len(names) * 2.2, 6), 6))
plt.bar(names, lats)
plt.xlabel("Latency (ms)")
plt.title("Benchmark Results")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Chart axis labels are swapped.

Line 85 sets xlabel to "Latency (ms)", but the X-axis displays function names. The latency values are on the Y-axis. This mislabeling will confuse readers of the generated charts.

Apply this fix:

     plt.figure(figsize=(max(len(names) * 2.2, 6), 6))
     plt.bar(names, lats)
-    plt.xlabel("Latency (ms)")
+    plt.xlabel("Functions")
+    plt.ylabel("Latency (ms)")
     plt.title("Benchmark Results")
🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 83 to 86, the axis labels are reversed:
the X-axis shows function names while the Y-axis shows latency, yet xlabel is
set to "Latency (ms)". Change the labels so the X-axis is labeled (e.g.)
"Function" (or "Name") and the Y-axis is labeled "Latency (ms)"; keep the bar
plot and sizing unchanged.

Comment on lines +96 to +108
def main():
# Entry point — automatically run all bench_* functions in caller file.
test_file = inspect.getsourcefile(sys._getframe(1))
out_dir = os.path.dirname(test_file)
module = {}
with open(test_file) as f:
exec(f.read(), module)

for name, func in module.items():
if name.startswith("bench_") and callable(func):
func()

analyze_records(_RECORDS, out_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Security risk: exec() evaluates arbitrary code from file.

Line 102 uses exec() to dynamically execute the caller's source file. This is a significant security risk if the benchmarking tool is ever run on untrusted or user-supplied files. Additionally, exec() bypasses normal import machinery, which can cause issues with relative imports, __file__, and module metadata.

Consider using proper module imports instead:

 def main():
-    # Entry point — automatically run all bench_* functions in caller file.
-    test_file = inspect.getsourcefile(sys._getframe(1))
-    out_dir = os.path.dirname(test_file)
-    module = {}
-    with open(test_file) as f:
-        exec(f.read(), module)
-
-    for name, func in module.items():
+    # Entry point — automatically run all bench_* functions in caller module.
+    caller_frame = sys._getframe(1)
+    caller_module = inspect.getmodule(caller_frame)
+    test_file = inspect.getsourcefile(caller_frame)
+    out_dir = os.path.dirname(test_file)
+
+    for name in dir(caller_module):
         if name.startswith("bench_") and callable(func):
+            func = getattr(caller_module, name)
             func()
 
     analyze_records(_RECORDS, out_dir)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.5)

102-102: Use of exec detected

(S102)

🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 96 to 108, replace the unsafe exec()
usage that reads and executes the caller source with a proper module import via
importlib: create a module spec from the test file path
(importlib.util.spec_from_file_location), create a module from that spec
(importlib.util.module_from_spec), execute the module via
spec.loader.exec_module(module) so the module has correct metadata (__file__,
package, etc.), optionally insert it into sys.modules to support relative
imports, then iterate module attributes to call bench_* functions and continue
to call analyze_records(_RECORDS, out_dir); this removes arbitrary-code-exec
risk and preserves normal import behavior.

Comment on lines +128 to +135
for root, _, files in os.walk(examples_root):
for file_name in files:
if re.match(r"^bench_.*\.py$", file_name):
full_path = os.path.join(root, file_name)
if root not in added_roots:
sys.path.insert(0, root)
added_roots.add(root)
mod = _load_module(full_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Modifying sys.path during iteration may cause import conflicts.

Lines 132-134 insert directories into sys.path while walking the examples tree. If two example directories contain modules with the same name, the first one will shadow the second, leading to incorrect benchmark execution.

Consider isolating each module's sys.path modification:

     for root, _, files in os.walk(examples_root):
         for file_name in files:
             if re.match(r"^bench_.*\.py$", file_name):
                 full_path = os.path.join(root, file_name)
-                if root not in added_roots:
-                    sys.path.insert(0, root)
-                    added_roots.add(root)
+                # Temporarily add directory for this specific import
+                sys.path.insert(0, root)
                 mod = _load_module(full_path)
+                sys.path.pop(0)
                 for name in dir(mod):
🤖 Prompt for AI Agents
In tilelang/tools/bench.py around lines 128 to 135, the code mutates sys.path
while walking examples which can cause module shadowing if different directories
contain identically named modules; instead, load each bench_*.py file without
permanently altering sys.path by using importlib.util.spec_from_file_location
(or runpy.run_path) to create a module from the file path and execute it with a
unique module name, or if you must modify sys.path temporarily, push the
directory, import, then immediately pop it (ensure uniqueness of module names)
so the global sys.path is not left modified and modules from different example
roots cannot shadow one another.

@yyttt6
Copy link
Contributor Author

yyttt6 commented Nov 19, 2025

/perf

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/pr-perfbench-bot.yml (1)

87-115: Fix the PNG upload approach (unresolved from previous review).

This implementation was flagged as critical in the previous review and remains unfixed. The workflow creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. Raw URLs require a commit SHA, branch name, or tag—not a tree SHA—so this will return 404 and break the image embedding.

The following approaches were recommended previously:

Option 1 (recommended): Upload as workflow artifact

- name: Upload benchmark chart
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-chart
    path: bench.png

Option 2: Commit and push image to a dedicated branch

- name: Upload PNG to GitHub
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git checkout --orphan benchmark-images
    git rm -rf .
    cp bench.png benchmark-${{ github.run_id }}.png
    git add benchmark-${{ github.run_id }}.png
    git commit -m "Add benchmark image for run ${{ github.run_id }}"
    git push origin benchmark-images --force

Then use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png

Choose and implement one approach to resolve this blocker before merging.

🧹 Nitpick comments (1)
.github/workflows/pr-perfbench-bot.yml (1)

76-80: Benchmark execution depends on ci_performance.py outputs.

The workflow assumes maint/scripts/ci_performance.py will produce bench.md and bench.png files. Ensure the script reliably generates these artifacts; consider adding error handling or validation if they're missing.

Would you like me to verify that ci_performance.py produces the expected outputs?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58e7ffe and 8df8ec5.

📒 Files selected for processing (1)
  • .github/workflows/pr-perfbench-bot.yml (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/pr-perfbench-bot.yml
🔇 Additional comments (3)
.github/workflows/pr-perfbench-bot.yml (3)

8-9: Verify cache permissions for self-hosted runner.

The workflow specifies only contents: write but uses actions/setup-python@v6 with cache: pip enabled. Per the learning from previous workflows, caching requires actions: write permission to save caches. Verify whether this impacts cache behavior on the self-hosted runner, and add actions: write if needed.


52-60: Venv setup for PR version looks solid.

The explicit disabling of pip's user mode and clear isolation of the venv is well-handled.


62-74: Parallel venv setup for main branch is clean.

The git state cleanup and separate venv creation are well-structured.

@yyttt6
Copy link
Contributor Author

yyttt6 commented Nov 19, 2025

/perf

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/pr-perfbench-bot.yml (1)

89-117: ⚠️ Image URL generation will fail with 404 errors.

The blob/tree approach creates orphaned Git objects that are not reachable from any ref. Raw GitHub URLs require a commit SHA, branch, or tag—not a tree SHA. The URL generated at line 116 will consistently return 404.

Consider one of these alternatives:

Option 1: Commit and push to a dedicated branch (recommended)

- name: Upload PNG to GitHub
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git checkout --orphan benchmark-images 2>/dev/null || git checkout benchmark-images
    git rm -rf . 2>/dev/null || true
    cp bench.png benchmark-${{ github.run_id }}.png
    git add benchmark-${{ github.run_id }}.png
    git commit -m "Add benchmark image for run ${{ github.run_id }}"
    git push origin benchmark-images --force

Then use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png

Option 2: Upload as a workflow artifact

- name: Upload benchmark chart
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-chart-${{ github.run_id }}
    path: bench.png

(Note: artifact URLs are less convenient for embedding, but are guaranteed to work.)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8df8ec5 and b5cb2b0.

📒 Files selected for processing (1)
  • .github/workflows/pr-perfbench-bot.yml (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/pr-perfbench-bot.yml
🔇 Additional comments (1)
.github/workflows/pr-perfbench-bot.yml (1)

56-61: Pip configuration cleanup is correct.

The explicit cleanup of pip configuration (PIP_CONFIG_FILE, PYTHONUSERBASE) and the --no-user flags ensure a clean, reproducible environment for both the merged and original installs.

Comment on lines 8 to +9
permissions:
contents: read
contents: write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Add actions: write permission for cache operations.

The current permissions block only specifies contents: write. When a permissions block is defined, any unlisted permission (including actions) defaults to none. This prevents setup-python's cache from being saved—it will only restore from cache but never write new entries.

Apply this diff:

 permissions:
   contents: write
+  actions: write

Based on learnings: In GitHub Actions workflows with a permissions block, setup-python's cache feature requires actions: write to save caches; without it, caches restore but never persist.


I'll verify this review comment by checking the codebase and GitHub Actions documentation.


🏁 Script executed:

#!/bin/bash
# Check if the workflow file exists and examine permissions block and setup-python usage
cat .github/workflows/pr-perfbench-bot.yml 2>/dev/null || echo "File not found"

Length of output: 4705


🌐 Web query:

GitHub Actions setup-python cache requires actions write permission

💡 Result:

Short answer: actions/cache (used by setup-python and many wrapper actions) needs the workflow's GITHUB_TOKEN to have actions: write when it needs to create/delete caches. Add permissions to the workflow or provide a PAT (for fork PRs where GITHUB_TOKEN is limited).

What to do

  • In your workflow or job, grant the permission:
    permissions:
    actions: write
    contents: read
  • Example (top of workflow):
    permissions:
    actions: write
    contents: read

Notes

  • For pull requests from forks the GITHUB_TOKEN is read-only for some permissions; to allow cache writes/deletes for fork PRs you must use a personal access token (repo scope) stored in a secret and pass it to the cache action (or disable deletion). [1][2][3]

Sources

  • cache action / examples requiring actions: write. [1]
  • REST / API note: deleting caches requires an auth token with repo scope or GitHub App actions: write. [2]
  • Marketplace/README examples calling out actions: write for cache deletion. [3]

Add actions: write permission for cache operations.

The workflow uses actions/setup-python@v6 with cache: pip enabled. When a permissions block is defined in GitHub Actions, the actions/cache (used by setup-python) requires actions: write permission to save caches. Without this permission, caches will restore but not persist across runs, degrading workflow performance.

Apply this diff:

 permissions:
   contents: write
+  actions: write
🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml around lines 8 to 9 currently grants only
contents: write; add an actions: write permission to the permissions block so
the actions/cache used by actions/setup-python (cache: pip) can persist
caches—update the permissions to include actions: write alongside contents:
write (keep existing permissions and only add actions: write).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
.github/workflows/pr-perfbench-bot.yml (2)

8-9: Add actions: write permission for cache operations.

The workflow uses actions/setup-python@v6 with cache: pip enabled (line 47). The actions/cache used by setup-python requires actions: write permission to save caches. Without this, caches restore but never persist across runs, degrading workflow performance.

Apply this diff to fix:

 permissions:
   contents: write
+  actions: write

Based on learnings: In GitHub Actions workflows with a permissions block, setup-python's cache feature requires actions: write to save caches; without it, caches restore but never persist.


89-117: Fix the PNG upload and URL generation approach.

The current implementation creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. This will not work—raw.githubusercontent.com URLs require a commit SHA, branch name, or tag, not a tree SHA. The URL will return 404.

Consider these alternatives:

Option 1: Upload as workflow artifact (simplest)

- name: Upload benchmark chart
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-chart-${{ github.run_id }}
    path: bench.png

Then reference the artifact in the comment with a note that it's available in the "Artifacts" section of the workflow run.

Option 2: Commit and push the image to a dedicated branch (allows embedding)

- name: Upload PNG to GitHub
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git checkout --orphan benchmark-images
    git rm -rf .
    cp bench.png benchmark-${{ github.run_id }}.png
    git add benchmark-${{ github.run_id }}.png
    git commit -m "Add benchmark image for run ${{ github.run_id }}"
    git push origin benchmark-images --force

Then use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png

Option 3: Use GitHub Issues API attachments
Upload via the issue attachments API endpoint (if available in your GitHub version).

Replace lines 89–117 with your chosen approach.

🧹 Nitpick comments (2)
.github/workflows/pr-perfbench-bot.yml (2)

56-76: Minor inconsistency: Error suppression differs between venv setups.

The merged version (lines 58–59) doesn't suppress errors when unsetting pip config keys, but the original version (lines 73–74) uses || true. If config keys don't exist, the first will fail while the second continues silently. For consistency and robustness, align both to suppress non-fatal errors.

Apply this diff:

           export PIP_CONFIG_FILE=/dev/null
           export PYTHONUSERBASE=""
-          pip config unset global.user
-          pip config unset user.user
+          pip config unset global.user || true
+          pip config unset user.user || true
           pip install --no-user -r requirements-test.txt

78-88: Verify file generation before downstream steps.

The workflow runs ci_performance.py (line 82) and assumes bench.md and bench.png are generated. Consider adding validation to ensure these files exist before attempting to read them.

Apply this diff:

       - name: Run performance test
         id: perfbench
         run: |
           source tl/bin/activate
           python maint/scripts/ci_performance.py
+          if [[ ! -f bench.md ]] || [[ ! -f bench.png ]]; then
+            echo "Error: Performance test did not generate required output files"
+            exit 1
+          fi
       - name: Read markdown table

This ensures that if the performance script fails to generate outputs, the workflow fails fast with a clear message.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5cb2b0 and 487e652.

📒 Files selected for processing (1)
  • .github/workflows/pr-perfbench-bot.yml (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/pr-perfbench-bot.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant