chore: add local performance benchmarks#385
Conversation
Greptile SummaryThis PR adds a suite of local-only performance benchmark scripts covering working-tree loading, patch parsing, layout/render planning, large-stream rendering, memory profiling, and optional competitor comparisons, along with a shared runner (
Confidence Score: 4/5Safe to merge; all changes are local benchmark scripts with no production code path affected. The review_plan_ms metric in render-layout.ts includes buildSplitRows cost a second time, so any comparison against split_rows_ms will draw incorrect conclusions about where planning time is spent — particularly on the large_single_file scenario where buildSplitRows dominates. The rest of the benchmark infrastructure looks correct. benchmarks/render-layout.ts — the reviewPlanMs measurement block calls buildSplitRows again, inflating the reported metric. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
CLI["bun run bench -- [args]"] --> parseArgs["parseArgs(Bun.argv)"]
parseArgs --> opts["RunOptions"]
opts --> loop["For each script x N samples"]
loop --> runScript["Bun.spawn bun run benchmarks/script.ts"]
runScript --> stdout["capture stdout"]
stdout --> parseMetrics["parseMetrics() METRIC key=value lines"]
parseMetrics --> accumulate["samplesByMetric Map"]
accumulate --> loop
accumulate --> aggregate["aggregateMetric() median/p75/p95/min/max"]
aggregate --> print["Print aggregated summary"]
aggregate --> outFile{"--out path?"}
outFile -->|yes| writeJSON["writeFileSync JSON BenchmarkRunResult v1"]
outFile -->|no| done["Done"]
writeJSON --> done
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
benchmarks/render-layout.ts:40-50
**`review_plan_ms` includes `buildSplitRows` cost, inflating the metric**
`reviewPlanMs` re-invokes `buildSplitRows` for every file before calling `buildReviewRenderPlan`, so the reported `*_review_plan_ms` metric actually measures `buildSplitRows + buildReviewRenderPlan` combined — not just the review plan step. Comparing it to `*_split_rows_ms` will always over-estimate planning cost; on the `large_single_file` scenario with 16,000 changed lines, `buildSplitRows` dominates and will swamp any `buildReviewRenderPlan` signal. The `splitRowsMs` measurement already owns `buildSplitRows`, so the split rows should be computed once and reused in the plan step.
### Issue 2 of 2
benchmarks/competitors.ts:52-54
**`patchFixture` leaks if `createChangedRepo` throws**
`patchFixture` is created before the `try` block, but `createChangedRepo` (which runs `git init`, commits, etc.) can throw if git is unavailable or disk is full. When that happens the `finally` block is never entered and `patchFixture`'s temp directory is never deleted. Wrapping all three top-level setup calls in the same try/finally, or creating both fixtures inside the try block, would prevent the leak.
Reviews (1): Last reviewed commit: "chore: add local performance benchmarks" | Re-trigger Greptile |
| const reviewPlanMs = measureMs(() => { | ||
| for (const file of files) { | ||
| const rows = buildSplitRows(file, null, theme); | ||
| plannedRows += buildReviewRenderPlan({ | ||
| fileId: file.id, | ||
| rows, | ||
| showHunkHeaders: true, | ||
| visibleAgentNotes: [], | ||
| }).length; | ||
| } | ||
| }); |
There was a problem hiding this comment.
review_plan_ms includes buildSplitRows cost, inflating the metric
reviewPlanMs re-invokes buildSplitRows for every file before calling buildReviewRenderPlan, so the reported *_review_plan_ms metric actually measures buildSplitRows + buildReviewRenderPlan combined — not just the review plan step. Comparing it to *_split_rows_ms will always over-estimate planning cost; on the large_single_file scenario with 16,000 changed lines, buildSplitRows dominates and will swamp any buildReviewRenderPlan signal. The splitRowsMs measurement already owns buildSplitRows, so the split rows should be computed once and reused in the plan step.
Prompt To Fix With AI
This is a comment left during a code review.
Path: benchmarks/render-layout.ts
Line: 40-50
Comment:
**`review_plan_ms` includes `buildSplitRows` cost, inflating the metric**
`reviewPlanMs` re-invokes `buildSplitRows` for every file before calling `buildReviewRenderPlan`, so the reported `*_review_plan_ms` metric actually measures `buildSplitRows + buildReviewRenderPlan` combined — not just the review plan step. Comparing it to `*_split_rows_ms` will always over-estimate planning cost; on the `large_single_file` scenario with 16,000 changed lines, `buildSplitRows` dominates and will swamp any `buildReviewRenderPlan` signal. The `splitRowsMs` measurement already owns `buildSplitRows`, so the split rows should be computed once and reused in the plan step.
How can I resolve this? If you propose a fix, please make it concise.| const patch = createSyntheticPatch({ fileCount: 96, lines: 180, changedLines: 36 }); | ||
| const patchFixture = createTemporaryDirectory("hunk-competitor-patch-"); | ||
| const repoFixture = createChangedRepo({ fileCount: 96, lines: 180, changedLines: 36 }); |
There was a problem hiding this comment.
patchFixture leaks if createChangedRepo throws
patchFixture is created before the try block, but createChangedRepo (which runs git init, commits, etc.) can throw if git is unavailable or disk is full. When that happens the finally block is never entered and patchFixture's temp directory is never deleted. Wrapping all three top-level setup calls in the same try/finally, or creating both fixtures inside the try block, would prevent the leak.
Prompt To Fix With AI
This is a comment left during a code review.
Path: benchmarks/competitors.ts
Line: 52-54
Comment:
**`patchFixture` leaks if `createChangedRepo` throws**
`patchFixture` is created before the `try` block, but `createChangedRepo` (which runs `git init`, commits, etc.) can throw if git is unavailable or disk is full. When that happens the `finally` block is never entered and `patchFixture`'s temp directory is never deleted. Wrapping all three top-level setup calls in the same try/finally, or creating both fixtures inside the try block, would prevent the leak.
How can I resolve this? If you propose a fix, please make it concise.
Summary
benchmarks/results/Validation
bun run format:checkbun run typecheckbun run lintbun run bench -- --samples 1 --script changeset-parse.ts --out /tmp/.../local.jsonThis PR description was generated by Pi using OpenAI GPT-5