chore: add local performance benchmarks by benvinegar · Pull Request #385 · modem-dev/hunk

benvinegar · 2026-05-30T00:59:48Z

Summary

add local-only performance benchmark scripts for loading, patch parsing, layout/render planning, highlighting, large-stream rendering/scrolling, memory profiling, and optional competitor comparisons
add a shared benchmark runner that aggregates samples and can write JSON into benchmarks/results/
document local benchmark workflows without adding PR or main CI gating

Validation

bun run format:check
bun run typecheck
bun run lint
bun run bench -- --samples 1 --script changeset-parse.ts --out /tmp/.../local.json

This PR description was generated by Pi using OpenAI GPT-5

greptile-apps · 2026-05-30T01:03:15Z

Greptile Summary

This PR adds a suite of local-only performance benchmark scripts covering working-tree loading, patch parsing, layout/render planning, large-stream rendering, memory profiling, and optional competitor comparisons, along with a shared runner (benchmarks/run.ts) that aggregates samples and writes JSON output. It also refactors the existing large-stream-fixture to replace notesPerFile with explicit changedStartLine/changedEndLine parameters, removing agent-annotation scaffolding that is no longer needed.

New scripts (working-tree-load, changeset-parse, render-layout, memory, competitors) each print METRIC name=value lines; run.ts orchestrates them, repeating each script --samples times, computing median/p75/p95, and optionally writing a versioned JSON artifact.
render-layout.ts re-invokes buildSplitRows inside the reviewPlanMs block, so *_review_plan_ms over-counts by including split-row construction cost that *_split_rows_ms already measures separately.
competitors.ts creates patchFixture before the try block, so a failure in createChangedRepo leaves that temp directory on disk.

Confidence Score: 4/5

Safe to merge; all changes are local benchmark scripts with no production code path affected.

The review_plan_ms metric in render-layout.ts includes buildSplitRows cost a second time, so any comparison against split_rows_ms will draw incorrect conclusions about where planning time is spent — particularly on the large_single_file scenario where buildSplitRows dominates. The rest of the benchmark infrastructure looks correct.

benchmarks/render-layout.ts — the reviewPlanMs measurement block calls buildSplitRows again, inflating the reported metric.

Important Files Changed

Filename	Overview
benchmarks/render-layout.ts	New layout/planning benchmark — review_plan_ms measurement double-counts buildSplitRows cost, making the metric inaccurate.
benchmarks/competitors.ts	New competitor comparison benchmark — patchFixture temp dir can leak if createChangedRepo throws before the try block is entered.
benchmarks/run.ts	New benchmark orchestrator — runs scripts, aggregates samples, and writes optional JSON output; logic looks correct.
benchmarks/lib/benchmark-result.ts	New types and helpers for aggregating/classifying benchmark metrics; percentile logic is correct.
benchmarks/lib/fixtures.ts	New shared fixture helpers for synthetic patches, source files, temp repos, and untracked file scenarios.
benchmarks/large-stream-fixture.ts	Replaces notesPerFile with changedStartLine/changedEndLine parameters; range check and stats are consistent with defaults.
benchmarks/working-tree-load.ts	New working-tree load benchmark across five repo shapes; fixture cleanup is properly handled with try/finally.
benchmarks/changeset-parse.ts	New parse benchmark covering three diff shapes; measures each stage independently and cleanly.
benchmarks/memory.ts	New RSS/heap profiler across load, planning, first-frame, and navigation stages; renderer is properly destroyed in finally.
benchmarks/large-stream.ts	Simplified to remove note-related code paths and reduced SCROLL_TICKS from 18 to 4.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI["bun run bench -- [args]"] --> parseArgs["parseArgs(Bun.argv)"]
    parseArgs --> opts["RunOptions"]
    opts --> loop["For each script x N samples"]
    loop --> runScript["Bun.spawn bun run benchmarks/script.ts"]
    runScript --> stdout["capture stdout"]
    stdout --> parseMetrics["parseMetrics() METRIC key=value lines"]
    parseMetrics --> accumulate["samplesByMetric Map"]
    accumulate --> loop
    accumulate --> aggregate["aggregateMetric() median/p75/p95/min/max"]
    aggregate --> print["Print aggregated summary"]
    aggregate --> outFile{"--out path?"}
    outFile -->|yes| writeJSON["writeFileSync JSON BenchmarkRunResult v1"]
    outFile -->|no| done["Done"]
    writeJSON --> done

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
benchmarks/render-layout.ts:40-50
**`review_plan_ms` includes `buildSplitRows` cost, inflating the metric**

`reviewPlanMs` re-invokes `buildSplitRows` for every file before calling `buildReviewRenderPlan`, so the reported `*_review_plan_ms` metric actually measures `buildSplitRows + buildReviewRenderPlan` combined — not just the review plan step. Comparing it to `*_split_rows_ms` will always over-estimate planning cost; on the `large_single_file` scenario with 16,000 changed lines, `buildSplitRows` dominates and will swamp any `buildReviewRenderPlan` signal. The `splitRowsMs` measurement already owns `buildSplitRows`, so the split rows should be computed once and reused in the plan step.

### Issue 2 of 2
benchmarks/competitors.ts:52-54
**`patchFixture` leaks if `createChangedRepo` throws**

`patchFixture` is created before the `try` block, but `createChangedRepo` (which runs `git init`, commits, etc.) can throw if git is unavailable or disk is full. When that happens the `finally` block is never entered and `patchFixture`'s temp directory is never deleted. Wrapping all three top-level setup calls in the same try/finally, or creating both fixtures inside the try block, would prevent the leak.

_{Reviews (1): Last reviewed commit: "chore: add local performance benchmarks" | Re-trigger Greptile}

greptile-apps · 2026-05-30T01:03:19Z

+  const reviewPlanMs = measureMs(() => {
+    for (const file of files) {
+      const rows = buildSplitRows(file, null, theme);
+      plannedRows += buildReviewRenderPlan({
+        fileId: file.id,
+        rows,
+        showHunkHeaders: true,
+        visibleAgentNotes: [],
+      }).length;
+    }
+  });


review_plan_ms includes buildSplitRows cost, inflating the metric

reviewPlanMs re-invokes buildSplitRows for every file before calling buildReviewRenderPlan, so the reported *_review_plan_ms metric actually measures buildSplitRows + buildReviewRenderPlan combined — not just the review plan step. Comparing it to *_split_rows_ms will always over-estimate planning cost; on the large_single_file scenario with 16,000 changed lines, buildSplitRows dominates and will swamp any buildReviewRenderPlan signal. The splitRowsMs measurement already owns buildSplitRows, so the split rows should be computed once and reused in the plan step.

Prompt To Fix With AI

This is a comment left during a code review. Path: benchmarks/render-layout.ts Line: 40-50 Comment: **`review_plan_ms` includes `buildSplitRows` cost, inflating the metric** `reviewPlanMs` re-invokes `buildSplitRows` for every file before calling `buildReviewRenderPlan`, so the reported `*_review_plan_ms` metric actually measures `buildSplitRows + buildReviewRenderPlan` combined — not just the review plan step. Comparing it to `*_split_rows_ms` will always over-estimate planning cost; on the `large_single_file` scenario with 16,000 changed lines, `buildSplitRows` dominates and will swamp any `buildReviewRenderPlan` signal. The `splitRowsMs` measurement already owns `buildSplitRows`, so the split rows should be computed once and reused in the plan step. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-30T01:03:20Z

+const patch = createSyntheticPatch({ fileCount: 96, lines: 180, changedLines: 36 });
+const patchFixture = createTemporaryDirectory("hunk-competitor-patch-");
+const repoFixture = createChangedRepo({ fileCount: 96, lines: 180, changedLines: 36 });


patchFixture leaks if createChangedRepo throws

patchFixture is created before the try block, but createChangedRepo (which runs git init, commits, etc.) can throw if git is unavailable or disk is full. When that happens the finally block is never entered and patchFixture's temp directory is never deleted. Wrapping all three top-level setup calls in the same try/finally, or creating both fixtures inside the try block, would prevent the leak.

Prompt To Fix With AI

This is a comment left during a code review. Path: benchmarks/competitors.ts Line: 52-54 Comment: **`patchFixture` leaks if `createChangedRepo` throws** `patchFixture` is created before the `try` block, but `createChangedRepo` (which runs `git init`, commits, etc.) can throw if git is unavailable or disk is full. When that happens the `finally` block is never entered and `patchFixture`'s temp directory is never deleted. Wrapping all three top-level setup calls in the same try/finally, or creating both fixtures inside the try block, would prevent the leak. How can I resolve this? If you propose a fix, please make it concise.

chore: add local performance benchmarks

f2a94e9

greptile-apps Bot reviewed May 30, 2026

View reviewed changes

benvinegar merged commit c83c462 into main May 30, 2026
8 checks passed

benvinegar deleted the chore/local-performance-benchmarks branch May 30, 2026 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add local performance benchmarks#385

chore: add local performance benchmarks#385
benvinegar merged 1 commit into
mainfrom
chore/local-performance-benchmarks

benvinegar commented May 30, 2026

Uh oh!

greptile-apps Bot commented May 30, 2026

Uh oh!

greptile-apps Bot May 30, 2026

Uh oh!

greptile-apps Bot May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benvinegar commented May 30, 2026

Summary

Validation

Uh oh!

greptile-apps Bot commented May 30, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant