[Feature]: Add benchmark scripts for examples #1287

yyttt6 · 2025-11-19T11:28:45Z

Summary by CodeRabbit

New Features
- End-to-end benchmarking harness added with repeatable measurements and aggregated results.
- Bench runs now produce a Markdown summary and a saved speedup plot; CI uploads and exposes these artifacts.
- PR performance comments now include the formatted results table and embedded visualization.
- Many new benchmark entry points added across examples (kernels, attention, quantization, sparse ops, inference).

github-actions · 2025-11-19T11:28:56Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-11-19T11:28:56Z

Walkthrough

Adds a benchmarking utility (tilelang.tools.bench), ~30 new example bench wrapper scripts, a CI performance aggregation/plotting script, updates the perfbench GitHub Actions workflow to publish comment + image, and includes a runtime log file from benchmark runs.

Changes

Cohort / File(s)	Summary
Benchmark core `\`tilelang/tools/bench.py``	New module providing `process_func()`, `main()`, `bench_all()`, `analyze_records()`, `suppress_output` context manager, a shared `_RECORDS` and `__all__ = ["main", "process_func"]` to run, collect, and visualize per-function benchmark latencies.
Example bench wrappers Multiple files `\`examples//bench_.py``(≈30 files, e.g.``examples/attention_sink/bench_example_attention_sink.py``,` `examples/flash_attention/bench_example_flash_attention.py``,` `examples/gemm/bench_example_gemm.py``,` `examples/topk/bench_topk_tilelang.py``, etc.)	Many new small scripts. Each defines one or more `bench_*()` functions that call `tilelang.tools.bench.process_func()` on example module `main()` functions and include a `__main__` guard to call `tilelang.tools.bench.main()`.
CI workflow `\`.github/workflows/pr-perfbench-bot.yml``	Workflow permission change (`contents: write`), modified pip install steps to use no-user installs, and added steps to read `bench.md`, upload `bench.png` as a blob, derive a raw URL, and expose it as an output for PR comment construction.
CI performance script `\`maint/scripts/ci_performance.py``	New/rewritten script that runs benchmarks, parses markdown table rows into a pandas DataFrame, computes per-file speedups, generates a seaborn/matplotlib bar chart (`bench.png`), writes `bench.md`, and prints/returns outputs for the workflow.
Runtime logs `\`log.txt``	Added benchmark run logs showing repeated runtime failures (TileLang builder while-loop error), CUDA OOMs, and multiple benchmark tracebacks (diagnostic output only).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Bench as tilelang.tools.bench
    participant Example as Example Module
    participant Store as _RECORDS / bench.png

    User->>Bench: main()
    Bench->>Bench: discover bench_* functions
    loop per bench_*
        Bench->>Example: process_func(target_func) (warmup, suppressed output)
        loop repeats
            Bench->>Example: call target_func (timed)
            Example-->>Bench: result or exception
            Bench->>Store: record latency or failure
        end
    end
    Bench->>Bench: analyze_records(out_dir)
    Bench->>Store: write bench.md and bench.png
    Bench-->>User: print results and saved image path

sequenceDiagram
    participant GHA as GitHub Actions
    participant Workflow as pr-perfbench-bot.yml
    participant CI as maint/scripts/ci_performance.py
    participant BenchTool as tilelang.tools.bench
    participant GHApi as GitHub API (blob/contents)
    participant PR as PR Comment

    GHA->>Workflow: run perf workflow
    Workflow->>CI: run ci_performance.py
    CI->>BenchTool: start benchmarks
    BenchTool-->>CI: bench.md + bench.png
    CI->>Workflow: set outputs (bench.md, bench.png path)
    Workflow->>GHApi: upload bench.png as blob -> get raw URL
    Workflow->>PR: post comment body with bench.md and embedded image URL

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Files needing extra attention:
- tilelang/tools/bench.py: timing logic, stdout/stderr suppression, per-run exception handling, global _RECORDS lifecycle and interaction when called from multiple modules.
- maint/scripts/ci_performance.py: Markdown parsing robustness, numeric conversions, speedup calculations, plotting edge cases and file I/O.
- .github/workflows/pr-perfbench-bot.yml: permission change and blob upload/raw URL logic.
- A representative sample of examples/*/bench_*.py to ensure no heavy work runs at import time.

Possibly related PRs

[CI][Refactor] Refactor non-test CI workflow files #971 — Related: modifies the same perfbench GitHub Actions workflow and posting flow.
[Example] Introduce split+sum template, and optimize atomic_add performance for bwd examples #940 — Related: changes to flash_attention examples invoked by new flash_attention bench wrappers.
[Example] Add block level high performance gemv example #1097 — Related: changes to example_gemv.main used by examples/gemv/bench_example_gemv.py.

Suggested reviewers

LeiWang1999
tzj-fxz

Poem

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[Feature]: Add benchmark scripts for examples' directly and clearly summarizes the main change: adding benchmark scripts across multiple example directories.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yyttt6 · 2025-11-19T11:29:02Z

/perf

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (19)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49f3539 and 58e7ffe.

📒 Files selected for processing (31)

.github/workflows/pr-perfbench-bot.yml (3 hunks)
examples/analyze/bench_example_analyze.py (1 hunks)
examples/attention_sink/bench_example_attention_sink.py (1 hunks)
examples/blocksparse_attention/bench_example_blocksparse_attention.py (1 hunks)
examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1 hunks)
examples/cast/bench_example_cast.py (1 hunks)
examples/convolution/bench_example_convolution.py (1 hunks)
examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1 hunks)
examples/deepseek_mla/bench_example_mla_decode.py (1 hunks)
examples/deepseek_nsa/bench_example_tilelang_nsa.py (1 hunks)
examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (1 hunks)
examples/dequantize_gemm/bench_example_dequantize_gemm.py (1 hunks)
examples/dynamic_shape/bench_example_dynamic.py (1 hunks)
examples/elementwise/bench_example_elementwise.py (1 hunks)
examples/flash_attention/bench_example_flash_attention.py (1 hunks)
examples/flash_decoding/bench_example_flash_decoding.py (1 hunks)
examples/fusedmoe/bench_example_fusedmoe.py (1 hunks)
examples/gemm/bench_example_gemm.py (1 hunks)
examples/gemm_fp8/bench_example_gemm_fp8.py (1 hunks)
examples/gemm_splitk/bench_example_gemm_splitk.py (1 hunks)
examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (1 hunks)
examples/gemv/bench_example_gemv.py (1 hunks)
examples/linear_attention/bench_linear_attn.py (1 hunks)
examples/minference/bench_vs_sparse_attn.py (1 hunks)
examples/seer_attention/bench_block_sparse_attn_tilelang.py (1 hunks)
examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1 hunks)
examples/topk/bench_topk_tilelang.py (1 hunks)
examples/warp_specialize/bench_example_warp_specialize.py (1 hunks)
log.txt (1 hunks)
maint/scripts/ci_performance.py (1 hunks)
tilelang/tools/bench.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-perfbench-bot.yml

🧬 Code graph analysis (28)

examples/gemm_splitk/bench_example_gemm_splitk.py (1)

examples/seer_attention/bench_block_sparse_attn_tilelang.py (1)

examples/gemm/bench_example_gemm.py (1)

examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py (6)

examples/elementwise/bench_example_elementwise.py (1)

examples/deepseek_mla/bench_example_mla_decode.py (1)

examples/dynamic_shape/bench_example_dynamic.py (1)

examples/deepseek_nsa/bench_example_tilelang_nsa.py (1)

examples/warp_specialize/bench_example_warp_specialize.py (1)

examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (1)

examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (1)

examples/flash_attention/bench_example_flash_attention.py (1)

examples/gemm_fp8/bench_example_gemm_fp8.py (1)

examples/fusedmoe/bench_example_fusedmoe.py (1)

examples/topk/bench_topk_tilelang.py (1)

examples/blocksparse_attention/bench_example_blocksparse_attention.py (1)

examples/cast/bench_example_cast.py (1)

examples/minference/bench_vs_sparse_attn.py (1)

examples/linear_attention/bench_linear_attn.py (1)

examples/analyze/bench_example_analyze.py (1)

maint/scripts/ci_performance.py (2)

examples/blocksparse_gemm/bench_example_blocksparse_gemm.py (1)

examples/dequantize_gemm/bench_example_dequantize_gemm.py (1)

examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py (1)

examples/flash_decoding/bench_example_flash_decoding.py (1)

examples/convolution/bench_example_convolution.py (1)

examples/gemv/bench_example_gemv.py (1)

examples/attention_sink/bench_example_attention_sink.py (1)

🪛 Ruff (0.14.5)

tilelang/tools/bench.py

39-40: try-except-pass detected, consider logging the exception

(S110)

39-39: Do not catch blind exception: Exception

(BLE001)

51-51: Do not catch blind exception: Exception

(BLE001)

102-102: Use of exec detected

(S102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Quick Lint

🔇 Additional comments (21)

coderabbitai · 2025-11-19T11:35:55Z

.github/workflows/pr-perfbench-bot.yml

+      - name: Upload PNG to GitHub and get URL
+        id: upload_png
+        uses: actions/github-script@v8
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const fs = require('fs');
+            const content = fs.readFileSync('bench.png').toString('base64');
+            // Create blob in the repo
+            const blob = await github.rest.git.createBlob({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              content: content,
+              encoding: "base64",
+            });
+            // Attach blob as a tree item
+            const tree = await github.rest.git.createTree({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              tree: [{
+                path: `bench_${context.runId}.png`,
+                mode: '100644',
+                type: 'blob',
+                sha: blob.data.sha
+              }]
+            });
+            // Raw file URL (works for embedding image)
+            const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png`
+            core.setOutput("url", url);


⚠️ Potential issue | 🔴 Critical

Fix the PNG upload and URL generation approach.

The current implementation creates an orphaned blob and tree without a commit, then attempts to use the tree SHA in a raw GitHub URL. This will not work because:

Raw GitHub URLs require a commit SHA, branch name, or tag—not a tree SHA

The created blob/tree exists in Git storage but is not reachable from any ref

The URL https://raw.githubusercontent.com/.../[tree-sha]/... will return 404

Consider these alternatives:

Option 1: Upload as workflow artifact and use GitHub API to get URL

- name: Upload benchmark chart uses: actions/upload-artifact@v4 with: name: benchmark-chart path: bench.png

Then link to the artifact in the comment (artifacts are accessible via GitHub UI but not embeddable).

Option 2: Commit and push the image to a dedicated branch

- name: Upload PNG to GitHub run: | git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git checkout --orphan benchmark-images git rm -rf . cp bench.png benchmark-${{ github.run_id }}.png git add benchmark-${{ github.run_id }}.png git commit -m "Add benchmark image for run ${{ github.run_id }}" git push origin benchmark-images --force

Then use: https://raw.githubusercontent.com/${{ github.repository }}/benchmark-images/benchmark-${{ github.run_id }}.png

Option 3: Use external image hosting service or GitHub issue attachments API

🤖 Prompt for AI Agents

.github/workflows/pr-perfbench-bot.yml lines 83-111: the workflow currently creates a blob and tree but never makes a commit or updates a ref, then builds a raw.githubusercontent URL using a tree SHA (which will 404). Replace this with one of the suggested approaches: either (A) upload bench.png as a workflow artifact using actions/upload-artifact@v4 and reference the artifact in the comment, or (B) commit the PNG to a dedicated branch (configure git user, create or checkout a branch, add bench_<runId>.png, commit and push) and then construct the raw.githubusercontent.com URL using the branch name, or (C) upload to an external image host or use the GitHub Issue attachments API; implement the chosen approach and remove the orphaned blob/tree creation and the invalid raw URL generation.

coderabbitai · 2025-11-19T11:35:56Z

examples/cast/bench_example_cast.py

+if globals().get("__name__") == "__main__":
+    tilelang.tools.bench.main()


🛠️ Refactor suggestion | 🟠 Major

Use idiomatic Python for main guard.

The expression globals().get("__name__") == "__main__" is unnecessarily verbose and non-standard. The conventional Python idiom is if __name__ == "__main__":, which is more readable and widely recognized.

Apply this change:

-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()

Note: This same pattern appears in all benchmark wrapper files in this PR and should be fixed consistently across:

examples/elementwise/bench_example_elementwise.py (line 9)

examples/seer_attention/bench_block_sparse_attn_tilelang.py (line 9)

examples/sparse_tensorcore/bench_example_sparse_tensorcore.py (line 10)

examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py (line 9)

examples/dynamic_shape/bench_example_dynamic.py (line 9)

examples/deepseek_nsa/bench_example_tilelang_nsa.py (line 14)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if globals().get("__name__") == "__main__":

tilelang.tools.bench.main()

if __name__ == "__main__":

tilelang.tools.bench.main()

🤖 Prompt for AI Agents

In examples/cast/bench_example_cast.py around lines 20-21, replace the non-idiomatic main guard `globals().get("__name__") == "__main__"` with the conventional Python idiom `if __name__ == "__main__":` to improve readability; apply the same change consistently to the other benchmark wrapper files listed in the comment (examples/elementwise/bench_example_elementwise.py line 9, examples/seer_attention/bench_block_sparse_attn_tilelang.py line 9, examples/sparse_tensorcore/bench_example_sparse_tensorcore.py line 10, examples/gemm_streamk/bench_example_tilelang_gemm_splitk.py line 9, examples/dynamic_shape/bench_example_dynamic.py line 9, examples/deepseek_nsa/bench_example_tilelang_nsa.py line 14).

coderabbitai · 2025-11-19T11:35:56Z

examples/flash_attention/bench_example_flash_attention.py

+if globals().get("__name__") == "__main__":
+    tilelang.tools.bench.main()


⚠️ Potential issue | 🟡 Minor

Use the standard Python idiom for the main guard.

The pattern if globals().get("__name__") == "__main__": is non-idiomatic. The standard Python idiom is clearer and more widely recognized.

Apply this diff to use the standard pattern:

-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if globals().get("__name__") == "__main__":

tilelang.tools.bench.main()

if __name__ == "__main__":

tilelang.tools.bench.main()

🤖 Prompt for AI Agents

In examples/flash_attention/bench_example_flash_attention.py around lines 107 to 108, replace the non-idiomatic main guard using globals().get("__name__") with the standard Python idiom: use if __name__ == "__main__": and call tilelang.tools.bench.main() inside that block so the script behaves as expected when run as a script.

coderabbitai · 2025-11-19T11:35:57Z

examples/gemm/bench_example_gemm.py

+if globals().get("__name__") == "__main__":
+    tilelang.tools.bench.main()


⚠️ Potential issue | 🟡 Minor

Use the standard Python idiom for the main guard.

The pattern if globals().get("__name__") == "__main__": is non-idiomatic. The standard Python idiom is clearer and more widely recognized.

Apply this diff to use the standard pattern:

-if globals().get("__name__") == "__main__": +if __name__ == "__main__": tilelang.tools.bench.main()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if globals().get("__name__") == "__main__":

tilelang.tools.bench.main()

if __name__ == "__main__":

tilelang.tools.bench.main()

🤖 Prompt for AI Agents

In examples/gemm/bench_example_gemm.py around lines 25-26, the main guard uses a non-idiomatic check (globals().get("__name__") == "__main__"); replace it with the standard Python idiom if __name__ == "__main__": and call tilelang.tools.bench.main() inside that block so the script executes correctly when run as a program and remains import-safe.

coderabbitai · 2025-11-19T11:35:57Z

maint/scripts/ci_performance.py

+        m = re.match(r"\|\s*([^\|]+)\s*\|\s*([0-9\.]+)\s*\|", line)
+        if m is not None:
+            data[m.group(1)] = float(m.group(2))
    return data


-output_v1 = subprocess.run(['./tl/bin/python', './maint/scripts/performance.py'],
-                           capture_output=True,
-                           text=True,
-                           env=env).stdout
-data_v1 = parse_output(output_v1)
+output_v1 = subprocess.run(
+    ['./tl/bin/python', '-c', 'import tilelang.tools.bench as b; b.bench_all()'],
+    capture_output=True,
+    text=True).stdout
+output_v2 = subprocess.run(
+    ['./tll/bin/python', '-c', 'import tilelang.tools.bench as b; b.bench_all()'],
+    capture_output=True,
+    text=True).stdout

-output_v2 = subprocess.run(['./tll/bin/python', './maint/scripts/performance.py'],
-                           capture_output=True,
-                           text=True,
-                           env=env).stdout
+data_v1 = parse_output(output_v1)
 data_v2 = parse_output(output_v2)
+table = []
+for key in data_v1.keys():
+    speedup = data_v1[key] / data_v2[key]
+    table.append([key, data_v1[key], data_v2[key], speedup])
+table.sort(key=lambda x: x[-1])
+
+headers = ["File", "Original Latency", "Current Latency", "Speedup"]
+
+with open("bench.md", "w") as f:
+    f.write(
+        tabulate(table, headers=headers, tablefmt="github", stralign="left", numalign="decimal"))
+    f.write("\n")

-table = [[
-    "original", data_v1['latency'], data_v1['best_tflops'], data_v1['ref_tflops'], data_v1['config']
-], [
-    "current", data_v2['latency'], data_v2['best_tflops'], data_v2['ref_tflops'], data_v2['config']
-]]
+df = pd.DataFrame(table, columns=headers)
+df = df.sort_values("Speedup", ascending=False).reset_index(drop=True)
+fig_width = max(0, len(df) * 0.35)
+plt.figure(figsize=(fig_width, 8))
+sns.set_theme(style="whitegrid", font_scale=0.9)
+bar_colors = sns.color_palette("magma", len(df))
+bars = plt.bar(range(len(df)), df["Speedup"], color=bar_colors, edgecolor="black")
+top3_idx = df.nlargest(3, "Speedup").index
+bot3_idx = df.nsmallest(3, "Speedup").index
+label_idx = set(top3_idx.tolist() + bot3_idx.tolist())

-headers = ["version", "Best Latency (s)", "Best TFlops", "Reference TFlops", "Best Config"]
+for i, val in enumerate(df["Speedup"]):
+    if i in label_idx:
+        plt.text(
+            i,
+            val + 0.02,
+            f"{val:.2f}x",
+            ha="center",
+            va="bottom",
+            color="red",
+            fontsize=8,
+            fontweight="bold")

-print(tabulate(table, headers=headers, tablefmt="github", stralign="left", numalign="decimal"))
+plt.xticks(range(len(df)), df["File"], rotation=70, ha='right', fontsize=12)
+plt.ylabel("Current Speedup vs Original", fontsize=14)
+plt.title("Current Speedup vs Original", fontsize=14, fontweight="bold")
+plt.ylim(0, max(df["Speedup"]) * 1.2)
+sns.despine()
+plt.tight_layout()
+plt.savefig("bench.png", dpi=300)


⚠️ Potential issue | 🟠 Major

Harden performance script against mismatched/empty benchmark results

The parsing/aggregation logic assumes “happy path” data and can break in real runs:

Key mismatch risk (Lines 31–36): for key in data_v1.keys(): ... data_v2[key] will raise KeyError if bench_all() in the second environment doesn’t emit a row for some benchmark present in the first (or vice versa). This is very plausible as benchmarks are added/removed over time.

Empty-data risk (Lines 46–52, 72): if both runs fail to produce any valid records (or none overlap), table is empty; pd.DataFrame(table, ...) yields an empty DF and max(df["Speedup"]) will raise (max() of empty sequence). The log you committed already shows several “failed in all repeats (no valid run)” cases, so this is not theoretical.

Div-by-zero edge case: If a latency ever ends up reported as 0.0 for the “current” version, speedup = data_v1[key] / data_v2[key] will crash. Unlikely, but cheap to guard.

To make CI more robust, I recommend building the table only over the intersection of keys and short‑circuiting when there is no usable data. For example:

data_v1 = parse_output(output_v1) data_v2 = parse_output(output_v2) -table = [] -for key in data_v1.keys(): - speedup = data_v1[key] / data_v2[key] - table.append([key, data_v1[key], data_v2[key], speedup]) -table.sort(key=lambda x: x[-1]) +table = [] +common_keys = sorted(set(data_v1.keys()) & set(data_v2.keys())) +for key in common_keys: + v1 = data_v1[key] + v2 = data_v2[key] + # Skip pathological cases to avoid crashes + if v2 == 0: + continue + speedup = v1 / v2 + table.append([key, v1, v2, speedup]) + +if not table: + # No overlapping, usable benchmarks — avoid generating empty reports + with open("bench.md", "w") as f: + f.write("No overlapping benchmark results found between original and current runs.\n") + # Skip plot generation + raise SystemExit(0) + +table.sort(key=lambda x: x[-1]) @@ -headers = ["File", "Original Latency", "Current Latency", "Speedup"] +headers = ["File", "Original Latency", "Current Latency", "Speedup"] @@ -df = pd.DataFrame(table, columns=headers) -df = df.sort_values("Speedup", ascending=False).reset_index(drop=True) -fig_width = max(0, len(df) * 0.35) +df = pd.DataFrame(table, columns=headers).sort_values( + "Speedup", ascending=False +).reset_index(drop=True) +fig_width = max(6, len(df) * 0.35)

This keeps the existing behavior for valid data, but prevents CI from failing purely due to missing/mismatched benchmarks or fully failed runs, and ensures the plot’s figure width is always positive.

🤖 Prompt for AI Agents

In maint/scripts/ci_performance.py around lines 16-75, the script assumes matching, non-empty benchmark outputs and may KeyError on missing keys, ZeroDivisionError if current latency is 0.0, and will crash when table/df is empty; fix by iterating only over the intersection of data_v1 and data_v2 keys (or explicitly check existence before accessing), skip any entry where data_v2[key] is zero (or treat as NaN/inf) to avoid division by zero, and after building the table short-circuit: if table is empty write a minimal bench.md (or a clear message) and skip plotting (or create a default empty DataFrame/plot with fixed minimum fig_width) so subsequent pandas/plt operations (like max on Speedup) cannot raise.

coderabbitai · 2025-11-19T11:35:58Z

tilelang/tools/bench.py

+    try:
+        with suppress_output():
+            for _ in range(warmup):
+                func(*args, **kwargs)
+    except Exception:
+        pass


⚠️ Potential issue | 🟠 Major

Silent warmup failures may hide configuration issues.

Warmup exceptions are completely suppressed (lines 39-40), which could mask genuine configuration problems, import errors, or invalid parameters. A benchmark that fails during warmup will proceed to timing runs, potentially wasting time or producing misleading results.

Consider logging warmup failures:

try: with suppress_output(): for _ in range(warmup): func(*args, **kwargs) - except Exception: - pass + except Exception as e: + warnings.warn( + f"warmup for {func.__module__} raised {type(e).__name__}: {e}", + RuntimeWarning, + stacklevel=2, + )

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.5)

39-40: try-except-pass detected, consider logging the exception

(S110)

39-39: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In tilelang/tools/bench.py around lines 35 to 40, the warmup loop currently swallows all exceptions (except Exception: pass). Replace this with explicit exception handling that captures the exception as a variable, logs the failure with full traceback (e.g., logger.exception or print to stderr) including context (which function and warmup iteration), and then aborts the benchmark (re-raise the exception or return/exit early) so timing runs do not proceed on a bad configuration.

coderabbitai · 2025-11-19T11:35:58Z

tilelang/tools/bench.py

+    else:
+        warnings.warn(
+            f"benchmark for {func.__module__} failed in all repeats (no valid run)",
+            RuntimeWarning,
+            stacklevel=2,
+        )


⚠️ Potential issue | 🟡 Minor

No record appended when all runs fail.

When times is empty (all runs failed), a warning is issued but no record is appended to _RECORDS. This causes the benchmark to be silently omitted from the final table and chart, which might confuse users expecting to see all attempted benchmarks.

Consider appending a sentinel record:

else: warnings.warn( f"benchmark for {func.__module__} failed in all repeats (no valid run)", RuntimeWarning, stacklevel=2, ) + _RECORDS.append((f"{func.__module__}", float('inf')))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

warnings.warn(

f"benchmark for {func.__module__} failed in all repeats (no valid run)",

RuntimeWarning,

stacklevel=2,

)

else:

warnings.warn(

f"benchmark for {func.__module__} failed in all repeats (no valid run)",

RuntimeWarning,

stacklevel=2,

)

_RECORDS.append((f"{func.__module__}", float('inf')))

🤖 Prompt for AI Agents

In tilelang/tools/bench.py around lines 66 to 71, when times is empty the code only emits a warning and does not append any record to _RECORDS, causing the benchmark to be omitted from outputs; update the branch so that after issuing the warning you append a sentinel record to _RECORDS containing the benchmark identity (func.__module__, func.__name__), count=0, mean and std as float("nan") (or None if existing records expect None), times as an empty list, success=False (or an equivalent flag used elsewhere), and any relevant metadata (e.g. repeats and label) so the final table/chart includes an entry indicating the benchmark failed all runs.

coderabbitai · 2025-11-19T11:35:58Z

tilelang/tools/bench.py

+    plt.figure(figsize=(max(len(names) * 2.2, 6), 6))
+    plt.bar(names, lats)
+    plt.xlabel("Latency (ms)")
+    plt.title("Benchmark Results")


⚠️ Potential issue | 🟡 Minor

Chart axis labels are swapped.

Line 85 sets xlabel to "Latency (ms)", but the X-axis displays function names. The latency values are on the Y-axis. This mislabeling will confuse readers of the generated charts.

Apply this fix:

plt.figure(figsize=(max(len(names) * 2.2, 6), 6)) plt.bar(names, lats) - plt.xlabel("Latency (ms)") + plt.xlabel("Functions") + plt.ylabel("Latency (ms)") plt.title("Benchmark Results")

🤖 Prompt for AI Agents

In tilelang/tools/bench.py around lines 83 to 86, the axis labels are reversed: the X-axis shows function names while the Y-axis shows latency, yet xlabel is set to "Latency (ms)". Change the labels so the X-axis is labeled (e.g.) "Function" (or "Name") and the Y-axis is labeled "Latency (ms)"; keep the bar plot and sizing unchanged.

coderabbitai · 2025-11-19T11:35:59Z

tilelang/tools/bench.py

+def main():
+    # Entry point — automatically run all bench_* functions in caller file.
+    test_file = inspect.getsourcefile(sys._getframe(1))
+    out_dir = os.path.dirname(test_file)
+    module = {}
+    with open(test_file) as f:
+        exec(f.read(), module)
+
+    for name, func in module.items():
+        if name.startswith("bench_") and callable(func):
+            func()
+
+    analyze_records(_RECORDS, out_dir)


⚠️ Potential issue | 🟠 Major

Security risk: exec() evaluates arbitrary code from file.

Line 102 uses exec() to dynamically execute the caller's source file. This is a significant security risk if the benchmarking tool is ever run on untrusted or user-supplied files. Additionally, exec() bypasses normal import machinery, which can cause issues with relative imports, __file__, and module metadata.

Consider using proper module imports instead:

def main(): - # Entry point — automatically run all bench_* functions in caller file. - test_file = inspect.getsourcefile(sys._getframe(1)) - out_dir = os.path.dirname(test_file) - module = {} - with open(test_file) as f: - exec(f.read(), module) - - for name, func in module.items(): + # Entry point — automatically run all bench_* functions in caller module. + caller_frame = sys._getframe(1) + caller_module = inspect.getmodule(caller_frame) + test_file = inspect.getsourcefile(caller_frame) + out_dir = os.path.dirname(test_file) + + for name in dir(caller_module): if name.startswith("bench_") and callable(func): + func = getattr(caller_module, name) func() analyze_records(_RECORDS, out_dir)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.5)

102-102: Use of exec detected

(S102)

🤖 Prompt for AI Agents

In tilelang/tools/bench.py around lines 96 to 108, replace the unsafe exec() usage that reads and executes the caller source with a proper module import via importlib: create a module spec from the test file path (importlib.util.spec_from_file_location), create a module from that spec (importlib.util.module_from_spec), execute the module via spec.loader.exec_module(module) so the module has correct metadata (__file__, package, etc.), optionally insert it into sys.modules to support relative imports, then iterate module attributes to call bench_* functions and continue to call analyze_records(_RECORDS, out_dir); this removes arbitrary-code-exec risk and preserves normal import behavior.

coderabbitai · 2025-11-19T11:35:59Z

tilelang/tools/bench.py

+    for root, _, files in os.walk(examples_root):
+        for file_name in files:
+            if re.match(r"^bench_.*\.py$", file_name):
+                full_path = os.path.join(root, file_name)
+                if root not in added_roots:
+                    sys.path.insert(0, root)
+                    added_roots.add(root)
+                mod = _load_module(full_path)


⚠️ Potential issue | 🟠 Major

Modifying sys.path during iteration may cause import conflicts.

Lines 132-134 insert directories into sys.path while walking the examples tree. If two example directories contain modules with the same name, the first one will shadow the second, leading to incorrect benchmark execution.

Consider isolating each module's sys.path modification:

for root, _, files in os.walk(examples_root): for file_name in files: if re.match(r"^bench_.*\.py$", file_name): full_path = os.path.join(root, file_name) - if root not in added_roots: - sys.path.insert(0, root) - added_roots.add(root) + # Temporarily add directory for this specific import + sys.path.insert(0, root) mod = _load_module(full_path) + sys.path.pop(0) for name in dir(mod):

🤖 Prompt for AI Agents

In tilelang/tools/bench.py around lines 128 to 135, the code mutates sys.path while walking examples which can cause module shadowing if different directories contain identically named modules; instead, load each bench_*.py file without permanently altering sys.path by using importlib.util.spec_from_file_location (or runpy.run_path) to create a module from the file path and execute it with a unique module name, or if you must modify sys.path temporarily, push the directory, import, then immediately pop it (ensure uniqueness of module names) so the global sys.path is not left modified and modules from different example roots cannot shadow one another.

yyttt6 · 2025-11-19T11:43:33Z

/perf

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

🧹 Nitpick comments (1)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58e7ffe and 8df8ec5.

📒 Files selected for processing (1)

.github/workflows/pr-perfbench-bot.yml (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-perfbench-bot.yml

🔇 Additional comments (3)

yyttt6 · 2025-11-19T11:53:57Z

/perf

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8df8ec5 and b5cb2b0.

📒 Files selected for processing (1)

.github/workflows/pr-perfbench-bot.yml (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-perfbench-bot.yml

🔇 Additional comments (1)

coderabbitai · 2025-11-19T11:58:44Z

.github/workflows/pr-perfbench-bot.yml

 permissions:
-  contents: read
+  contents: write


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Add actions: write permission for cache operations.

The current permissions block only specifies contents: write. When a permissions block is defined, any unlisted permission (including actions) defaults to none. This prevents setup-python's cache from being saved—it will only restore from cache but never write new entries.

Apply this diff:

permissions: contents: write + actions: write

Based on learnings: In GitHub Actions workflows with a permissions block, setup-python's cache feature requires actions: write to save caches; without it, caches restore but never persist.

I'll verify this review comment by checking the codebase and GitHub Actions documentation.

🏁 Script executed:

#!/bin/bash # Check if the workflow file exists and examine permissions block and setup-python usage cat .github/workflows/pr-perfbench-bot.yml 2>/dev/null || echo "File not found"

Length of output: 4705

🌐 Web query:

GitHub Actions setup-python cache requires actions write permission

💡 Result:

Short answer: actions/cache (used by setup-python and many wrapper actions) needs the workflow's GITHUB_TOKEN to have actions: write when it needs to create/delete caches. Add permissions to the workflow or provide a PAT (for fork PRs where GITHUB_TOKEN is limited).

What to do

In your workflow or job, grant the permission:
permissions:
actions: write
contents: read

Example (top of workflow):
permissions:
actions: write
contents: read

Notes

For pull requests from forks the GITHUB_TOKEN is read-only for some permissions; to allow cache writes/deletes for fork PRs you must use a personal access token (repo scope) stored in a secret and pass it to the cache action (or disable deletion). [1][2][3]

Sources

cache action / examples requiring actions: write. [1]

REST / API note: deleting caches requires an auth token with repo scope or GitHub App actions: write. [2]

Marketplace/README examples calling out actions: write for cache deletion. [3]

Add actions: write permission for cache operations.

The workflow uses actions/setup-python@v6 with cache: pip enabled. When a permissions block is defined in GitHub Actions, the actions/cache (used by setup-python) requires actions: write permission to save caches. Without this permission, caches will restore but not persist across runs, degrading workflow performance.

Apply this diff:

permissions: contents: write + actions: write

🤖 Prompt for AI Agents

.github/workflows/pr-perfbench-bot.yml around lines 8 to 9 currently grants only contents: write; add an actions: write permission to the permissions block so the actions/cache used by actions/setup-python (cache: pip) can persist caches—update the permissions to include actions: write alongside contents: write (keep existing permissions and only add actions: write).

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

🧹 Nitpick comments (2)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5cb2b0 and 487e652.

📒 Files selected for processing (1)

.github/workflows/pr-perfbench-bot.yml (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-perfbench-bot.yml

yyttt6 added 24 commits November 19, 2025 09:59

update 0

f129b3b

update1

de9c98a

format

12df0ec

format

725342d

format

ec019fc

update

29677d2

format

09912b5

fix

17504a0

strip_decorators

f2a4128

rabbit

7fa4d18

fix

6edc3a9

rabbit

041aa95

fix

4767697

format

31bfbe6

update

c16ebfe

update'

2921c73

format

1c23908

smaller

730ae68

update1

164672f

format

c7b3413

rabbit

494ba47

format

326b8e5

fix

7d75df2

fix

58e7ffe

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

fix ci

8df8ec5

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

f

b5cb2b0

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

format

487e652

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

		if globals().get("__name__") == "__main__":
		tilelang.tools.bench.main()

[Feature]: Add benchmark scripts for examples #1287

Are you sure you want to change the base?

[Feature]: Add benchmark scripts for examples #1287

Uh oh!

Conversation

yyttt6 commented Nov 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

coderabbitai bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

yyttt6 commented Nov 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

yyttt6 commented Nov 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yyttt6 commented Nov 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

yyttt6 commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2025 •

edited

Loading