Conversation
|
be2aa98 to
50e812a
Compare
d419fb0 to
8989917
Compare
b2e867c to
6894c79
Compare
There was a problem hiding this comment.
Pull request overview
Adds a dedicated Slang v2 benchmark suite (parser + IR building) to the existing perf/iai-callgrind infrastructure, including CLI and CI workflow wiring so results can be published to a separate Bencher dashboard.
Changes:
- Introduce
tests_v2benchmark helpers for v2 parsing and IR building. - Add a new
slang_v2iai-callgrind bench binary and integrate v2 parsing into the existing comparison bench. - Wire up
infra perf cargo slang-v2plus a new GitHub Actions workflow and Bencher project.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/solidity/testing/perf/cargo/src/tests_v2/parser.rs | Adds v2 parser benchmark helper (setup/run/test + counting). |
| crates/solidity/testing/perf/cargo/src/tests_v2/ir_builder.rs | Adds v2 IR building benchmark helper and contract counting. |
| crates/solidity/testing/perf/cargo/src/tests_v2/mod.rs | Exposes v2 benchmark modules. |
| crates/solidity/testing/perf/cargo/src/tests/mod.rs | Removes old slang_v2_parser module export. |
| crates/solidity/testing/perf/cargo/src/lib.rs | Exposes tests_v2 and updates unit test to validate v2 parser + IR builder. |
| crates/solidity/testing/perf/cargo/benches/slang_v2/main.rs | New dedicated v2 benchmark suite (parser + IR builder) grouped per project. |
| crates/solidity/testing/perf/cargo/benches/comparison/main.rs | Switches v2 comparison benchmark to use tests_v2::parser. |
| crates/solidity/testing/perf/cargo/benches/slang/main.rs | Updates dependency “used” list to include v2 semantic crate. |
| crates/solidity/testing/perf/cargo/Cargo.toml | Adds slang_solidity_v2_semantic dep and registers slang_v2 bench target. |
| crates/infra/cli/src/commands/perf/cargo/mod.rs | Adds slang-v2 cargo perf subcommand + default Bencher project. |
| .github/workflows/benchmark_cargo_slang_v2.yml | Adds CI workflow for running/publishing v2 benchmarks. |
| Cargo.lock | Locks new dependency usage for perf crate. |
Comments suppressed due to low confidence (2)
crates/solidity/testing/perf/cargo/src/tests_v2/parser.rs:14
rundiscards theVec<SourceUnit>returned bytest. Since the benchmark callsblack_box(parser::run(...))andrunreturns(), the optimizer can potentially eliminate the parsing work (or large parts of it), producing misleading benchmark results. Return the parsed source units fromrun(orblack_boxthe result insiderun) so the work is kept observable to the optimizer.
crates/solidity/testing/perf/cargo/src/tests_v2/parser.rs:22- Iteration over
project.sources.values()usesHashMapiteration order, which is intentionally non-deterministic (per-process randomized). That can add noise to benchmark results and make regressions harder to compare across runs. Consider iterating sources in a stable order (e.g., sort by path/key first) before parsing.
6894c79 to
5b618a5
Compare
8989917 to
926b0e1
Compare
08cf155 to
8cf4a26
Compare
926b0e1 to
e91e351
Compare
8cf4a26 to
9711a36
Compare
4a0a95d to
81d33ce
Compare
| pub fn [< $prj _parser >](project: &SolidityProject) { | ||
| black_box(tests_v2::parser::run(project)) | ||
| } |
There was a problem hiding this comment.
These parsers don't use the compilation unit yet, I'll address that in another PR.
There was a problem hiding this comment.
I wonder if the compilation unit should be a separate test(s)?
It will be resolving imports, doing version validation, and many other things that are typically outside the realm of just "parsing".
There was a problem hiding this comment.
If we're using the compilation unit end to end, I'd say yes. However, two things to note:
- There's already a just parsing benchmark, the comparison one
- These V2 tests still need a the V1 compilation unit to resolve imports, I think we'll need to benchmark using V2's import resolution.
We discussed it with @ggiraldez shortly, the current approach of the V2 compilation unit doesn't separate phases, so we need to either make some testing API that would allow to separate phases for benchmarking, or find a way to separate the phases after mesurement (in theory this is doable, unsure of the cost)
There was a problem hiding this comment.
I was wondering if we should start with benchmarking user operations? i.e. tracking specific usage patterns we care about, compared to internal APIs/stages.
For example, tracking adding a single file, resolving imports/adding all files, building the AST for the whole thing, querying the type system, etc...
This way, we get to track meaningful improvements/regressions from the user POV, while also minimizing noise from internal changes if they don't affect the user.
In any case, iai_callgrind can emit a flame graph (or diff two of them) to measure/isolate specific internal calls/APIs over time.
WDYT?
There was a problem hiding this comment.
we should start with benchmarking user operations
absolutely! I'm hoping to have some time to work on that in the next few weeks
iai_callgrind can emit a flame graph (or diff two of them) to measure/isolate specific internal calls/APIs over time.
I haven't tried the diff of flamegraph, I'll have a look.
iai_callgrind's flamegraph is broken uninformative, since they're rendered as a staircase of functions. gungraun/gungraun#523, so I'm unsure of the usefulness of them.
However, I'd like to isolate these differences in an automated and tracked approach, rather than manually looking at the diffs. Maybe bencher's callgrind adapter could split these measurements up?
| pub fn [< $prj _parser >](project: &SolidityProject) { | ||
| black_box(tests_v2::parser::run(project)) | ||
| } |
There was a problem hiding this comment.
I wonder if the compilation unit should be a separate test(s)?
It will be resolving imports, doing version validation, and many other things that are typically outside the realm of just "parsing".
81d33ce to
b2e844e
Compare
|
| Branch | teofr/add-v2-slang-benchmarks |
| Testbed | ci |
🚨 3 Alerts
| Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Upper Boundary (Limit %) |
|---|---|---|---|---|
| antlr_build_ast_duration_one_step_leverage_f | Duration Measure (units) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 382.01 units(+14.38%)Baseline: 333.97 units | 367.37 units (103.98%) |
| slang_init_bindings_graph_duration_cooldogs | Duration Measure (units) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 740.78 units(+10.87%)Baseline: 668.17 units | 734.98 units (100.79%) |
| slang_init_bindings_graph_duration_uniswap | Duration Measure (units) | 📈 plot 🚷 threshold 🚨 alert (🔔) | 776.79 units(+10.42%)Baseline: 703.51 units | 773.86 units (100.38%) |
Click to view all benchmark results
| Benchmark | Duration | Benchmark Result Measure (units) (Result Δ%) | Upper Boundary Measure (units) (Limit %) |
|---|---|---|---|
| antlr_build_ast_duration_cooldogs | 📈 view plot 🚷 view threshold | 737.85 units(+1.39%)Baseline: 727.71 units | 800.48 units (92.18%) |
| antlr_build_ast_duration_create_x | 📈 view plot 🚷 view threshold | 201.97 units(+0.34%)Baseline: 201.29 units | 221.42 units (91.22%) |
| antlr_build_ast_duration_merkle_proof | 📈 view plot 🚷 view threshold | 53.24 units(-2.26%)Baseline: 54.47 units | 59.92 units (88.86%) |
| antlr_build_ast_duration_mooniswap | 📈 view plot 🚷 view threshold | 478.94 units(-0.30%)Baseline: 480.36 units | 528.39 units (90.64%) |
| antlr_build_ast_duration_multicall3 | 📈 view plot 🚷 view threshold | 49.13 units(-2.44%)Baseline: 50.36 units | 55.39 units (88.69%) |
| antlr_build_ast_duration_one_step_leverage_f | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 382.01 units(+14.38%)Baseline: 333.97 units | 367.37 units (103.98%) |
| antlr_build_ast_duration_pointer_libraries | 📈 view plot 🚷 view threshold | 74.49 units(-0.93%)Baseline: 75.19 units | 82.71 units (90.06%) |
| antlr_build_ast_duration_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 319.84 units(-3.41%)Baseline: 331.13 units | 364.24 units (87.81%) |
| antlr_build_ast_duration_uniswap | 📈 view plot 🚷 view threshold | 697.14 units(-3.32%)Baseline: 721.05 units | 793.16 units (87.89%) |
| antlr_build_ast_duration_weighted_pool | 📈 view plot 🚷 view threshold | 719.63 units(-4.43%)Baseline: 752.99 units | 828.29 units (86.88%) |
| slang_init_bindings_graph_duration_cooldogs | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 740.78 units(+10.87%)Baseline: 668.17 units | 734.98 units (100.79%) |
| slang_init_bindings_graph_duration_create_x | 📈 view plot 🚷 view threshold | 139.89 units(+6.90%)Baseline: 130.86 units | 143.95 units (97.18%) |
| slang_init_bindings_graph_duration_merkle_proof | 📈 view plot 🚷 view threshold | 43.64 units(+4.40%)Baseline: 41.80 units | 45.98 units (94.91%) |
| slang_init_bindings_graph_duration_mooniswap | 📈 view plot 🚷 view threshold | 384.17 units(+8.79%)Baseline: 353.14 units | 388.46 units (98.90%) |
| slang_init_bindings_graph_duration_multicall3 | 📈 view plot 🚷 view threshold | 46.96 units(+1.45%)Baseline: 46.29 units | 50.92 units (92.22%) |
| slang_init_bindings_graph_duration_one_step_leverage_f | 📈 view plot 🚷 view threshold | 414.19 units(+7.05%)Baseline: 386.92 units | 425.61 units (97.32%) |
| slang_init_bindings_graph_duration_pointer_libraries | 📈 view plot 🚷 view threshold | 399.83 units(+7.59%)Baseline: 371.63 units | 408.79 units (97.81%) |
| slang_init_bindings_graph_duration_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 446.89 units(+2.51%)Baseline: 435.96 units | 479.56 units (93.19%) |
| slang_init_bindings_graph_duration_uniswap | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 776.79 units(+10.42%)Baseline: 703.51 units | 773.86 units (100.38%) |
| slang_init_bindings_graph_duration_weighted_pool | 📈 view plot 🚷 view threshold | 703.37 units(+8.22%)Baseline: 649.95 units | 714.95 units (98.38%) |
| slang_parse_all_files_duration_cooldogs | 📈 view plot 🚷 view threshold | 103.96 units(-1.67%)Baseline: 105.72 units | 116.30 units (89.39%) |
| slang_parse_all_files_duration_create_x | 📈 view plot 🚷 view threshold | 46.14 units(-1.51%)Baseline: 46.85 units | 51.53 units (89.54%) |
| slang_parse_all_files_duration_merkle_proof | 📈 view plot 🚷 view threshold | 11.38 units(-1.40%)Baseline: 11.54 units | 12.70 units (89.63%) |
| slang_parse_all_files_duration_mooniswap | 📈 view plot 🚷 view threshold | 114.08 units(-0.34%)Baseline: 114.47 units | 125.91 units (90.60%) |
| slang_parse_all_files_duration_multicall3 | 📈 view plot 🚷 view threshold | 13.41 units(-7.35%)Baseline: 14.47 units | 15.92 units (84.22%) |
| slang_parse_all_files_duration_one_step_leverage_f | 📈 view plot 🚷 view threshold | 112.41 units(+0.45%)Baseline: 111.90 units | 123.09 units (91.32%) |
| slang_parse_all_files_duration_pointer_libraries | 📈 view plot 🚷 view threshold | 78.92 units(-2.84%)Baseline: 81.23 units | 89.35 units (88.32%) |
| slang_parse_all_files_duration_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 114.22 units(-3.10%)Baseline: 117.87 units | 129.66 units (88.09%) |
| slang_parse_all_files_duration_uniswap | 📈 view plot 🚷 view threshold | 207.20 units(-0.78%)Baseline: 208.83 units | 229.71 units (90.20%) |
| slang_parse_all_files_duration_weighted_pool | 📈 view plot 🚷 view threshold | 203.68 units(-1.08%)Baseline: 205.90 units | 226.49 units (89.93%) |
| slang_resolve_all_bindings_duration_cooldogs | 📈 view plot 🚷 view threshold | 229.62 units(+0.89%)Baseline: 227.60 units | 250.36 units (91.71%) |
| slang_resolve_all_bindings_duration_create_x | 📈 view plot 🚷 view threshold | 14.84 units(+4.86%)Baseline: 14.15 units | 15.57 units (95.33%) |
| slang_resolve_all_bindings_duration_merkle_proof | 📈 view plot 🚷 view threshold | 2.87 units(+1.34%)Baseline: 2.83 units | 3.12 units (92.13%) |
| slang_resolve_all_bindings_duration_mooniswap | 📈 view plot 🚷 view threshold | 2,804.30 units(+5.07%)Baseline: 2,669.08 units | 2,935.99 units (95.51%) |
| slang_resolve_all_bindings_duration_multicall3 | 📈 view plot 🚷 view threshold | 3.72 units(-6.01%)Baseline: 3.96 units | 4.35 units (85.45%) |
| slang_resolve_all_bindings_duration_one_step_leverage_f | 📈 view plot 🚷 view threshold | 173.47 units(+4.47%)Baseline: 166.05 units | 182.66 units (94.97%) |
| slang_resolve_all_bindings_duration_pointer_libraries | 📈 view plot 🚷 view threshold | 20.20 units(+6.77%)Baseline: 18.92 units | 20.81 units (97.07%) |
| slang_resolve_all_bindings_duration_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 84.59 units(-0.25%)Baseline: 84.80 units | 93.28 units (90.68%) |
| slang_resolve_all_bindings_duration_uniswap | 📈 view plot 🚷 view threshold | 1,245.67 units(+6.43%)Baseline: 1,170.46 units | 1,287.51 units (96.75%) |
| slang_resolve_all_bindings_duration_weighted_pool | 📈 view plot 🚷 view threshold | 10,021.98 units(+5.85%)Baseline: 9,468.51 units | 10,415.36 units (96.22%) |
| slang_resolve_main_file_duration_cooldogs | 📈 view plot 🚷 view threshold | 221.90 units(+1.21%)Baseline: 219.26 units | 241.18 units (92.01%) |
| slang_resolve_main_file_duration_create_x | 📈 view plot 🚷 view threshold | 14.77 units(+4.81%)Baseline: 14.09 units | 15.50 units (95.29%) |
| slang_resolve_main_file_duration_merkle_proof | 📈 view plot 🚷 view threshold | 2.80 units(+0.77%)Baseline: 2.78 units | 3.06 units (91.61%) |
| slang_resolve_main_file_duration_mooniswap | 📈 view plot 🚷 view threshold | 2,169.60 units(+4.76%)Baseline: 2,071.09 units | 2,278.20 units (95.23%) |
| slang_resolve_main_file_duration_multicall3 | 📈 view plot 🚷 view threshold | 3.65 units(-6.51%)Baseline: 3.90 units | 4.29 units (84.99%) |
| slang_resolve_main_file_duration_one_step_leverage_f | 📈 view plot 🚷 view threshold | 173.39 units(+4.46%)Baseline: 165.99 units | 182.58 units (94.96%) |
| slang_resolve_main_file_duration_pointer_libraries | 📈 view plot 🚷 view threshold | 20.11 units(+6.69%)Baseline: 18.85 units | 20.73 units (96.99%) |
| slang_resolve_main_file_duration_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 36.26 units(+0.17%)Baseline: 36.20 units | 39.82 units (91.06%) |
| slang_resolve_main_file_duration_uniswap | 📈 view plot 🚷 view threshold | 1,150.47 units(+6.43%)Baseline: 1,080.98 units | 1,189.07 units (96.75%) |
| slang_resolve_main_file_duration_weighted_pool | 📈 view plot 🚷 view threshold | 6,984.36 units(+6.14%)Baseline: 6,580.46 units | 7,238.50 units (96.49%) |
| slang_total_cooldogs | 📈 view plot 🚷 view threshold | 1,074.38 units(+7.28%)Baseline: 1,001.50 units | 1,101.65 units (97.52%) |
| slang_total_create_x | 📈 view plot 🚷 view threshold | 200.88 units(+4.70%)Baseline: 191.87 units | 211.05 units (95.18%) |
| slang_total_merkle_proof | 📈 view plot 🚷 view threshold | 57.90 units(+3.06%)Baseline: 56.18 units | 61.80 units (93.69%) |
| slang_total_mooniswap | 📈 view plot 🚷 view threshold | 3,302.56 units(+5.29%)Baseline: 3,136.70 units | 3,450.37 units (95.72%) |
| slang_total_multicall3 | 📈 view plot 🚷 view threshold | 64.10 units(-0.97%)Baseline: 64.73 units | 71.20 units (90.03%) |
| slang_total_one_step_leverage_f | 📈 view plot 🚷 view threshold | 700.07 units(+5.29%)Baseline: 664.88 units | 731.37 units (95.72%) |
| slang_total_pointer_libraries | 📈 view plot 🚷 view threshold | 498.96 units(+5.76%)Baseline: 471.78 units | 518.96 units (96.15%) |
| slang_total_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 645.71 units(+1.11%)Baseline: 638.64 units | 702.51 units (91.92%) |
| slang_total_uniswap | 📈 view plot 🚷 view threshold | 2,229.67 units(+7.05%)Baseline: 2,082.81 units | 2,291.09 units (97.32%) |
| slang_total_weighted_pool | 📈 view plot 🚷 view threshold | 10,929.04 units(+5.86%)Baseline: 10,324.36 units | 11,356.80 units (96.23%) |
| solc_build_ast_duration_cooldogs | 📈 view plot 🚷 view threshold | 5,888.70 units(-3.17%)Baseline: 6,081.79 units | 6,689.97 units (88.02%) |
| solc_build_ast_duration_create_x | 📈 view plot 🚷 view threshold | 198.80 units(-2.85%)Baseline: 204.64 units | 225.11 units (88.31%) |
| solc_build_ast_duration_merkle_proof | 📈 view plot 🚷 view threshold | 91.71 units(+3.56%)Baseline: 88.55 units | 97.41 units (94.15%) |
| solc_build_ast_duration_mooniswap | 📈 view plot 🚷 view threshold | 267.16 units(+1.06%)Baseline: 264.37 units | 290.81 units (91.87%) |
| solc_build_ast_duration_multicall3 | 📈 view plot 🚷 view threshold | 95.48 units(+7.41%)Baseline: 88.89 units | 97.78 units (97.65%) |
| solc_build_ast_duration_one_step_leverage_f | 📈 view plot 🚷 view threshold | 424.85 units(-3.80%)Baseline: 441.64 units | 485.81 units (87.45%) |
| solc_build_ast_duration_pointer_libraries | 📈 view plot 🚷 view threshold | 288.03 units(-3.14%)Baseline: 297.36 units | 327.10 units (88.06%) |
| solc_build_ast_duration_ui_pool_data_provider_v3 | 📈 view plot 🚷 view threshold | 444.50 units(-5.17%)Baseline: 468.72 units | 515.59 units (86.21%) |
| solc_build_ast_duration_uniswap | 📈 view plot 🚷 view threshold | 879.60 units(-1.80%)Baseline: 895.74 units | 985.32 units (89.27%) |
| solc_build_ast_duration_weighted_pool | 📈 view plot 🚷 view threshold | 445.09 units(+0.61%)Baseline: 442.41 units | 486.65 units (91.46%) |
2a6fe65 to
1f5adf6
Compare
7d2f976 to
c1adb8b
Compare
| use slang_solidity::cst::{NodeKind, TerminalKindExtensions}; | ||
|
|
||
| use crate::tests::bindings_build::BuiltBindingGraph; | ||
| use super::bindings_build::BuiltBindingGraph; |
There was a problem hiding this comment.
nit: This change is inconsistent with the rest of the namespace changes. Any particular reason?
There was a problem hiding this comment.
No reason, just a typo.
Good catch!
| with: | ||
| persist-credentials: false | ||
|
|
||
| - name: "Restore Cache" |
There was a problem hiding this comment.
Just FYI, in case this got merged after #1600, we would need to replace the first two steps here with the following:
- name: "Checkout Repository"
uses: "actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd" # v6.0.2
with:
persist-credentials: false
fetch-depth: 0 # Full history is needed by 'git/restore-mtime' to find each file's last commit time.
- name: "Restore file mtimes from git"
uses: "./.github/actions/git/restore-mtime"
- name: "Restore Dependencies Cache"
uses: "./.github/actions/cache/dependencies/restore"
- name: "Restore Cargo Target Cache"
uses: "./.github/actions/cache/cargo-target/restore"There was a problem hiding this comment.
I think I won the race ;)
| pub fn [< $prj _parser >](project: &SolidityProject) { | ||
| black_box(tests_v2::parser::run(project)) | ||
| } |
There was a problem hiding this comment.
I was wondering if we should start with benchmarking user operations? i.e. tracking specific usage patterns we care about, compared to internal APIs/stages.
For example, tracking adding a single file, resolving imports/adding all files, building the AST for the whole thing, querying the type system, etc...
This way, we get to track meaningful improvements/regressions from the user POV, while also minimizing noise from internal changes if they don't affect the user.
In any case, iai_callgrind can emit a flame graph (or diff two of them) to measure/isolate specific internal calls/APIs over time.
WDYT?
| pub mod binder_v2_cleanup; | ||
| pub mod binder_v2_run; |
There was a problem hiding this comment.
should these two move to slang_v2/binder_run and slang_v2/binder_cleanup?
There was a problem hiding this comment.
Or actually, IIUC, this is still using the V1 parser. I wonder if we should just (in a later PR) delete them and add completely new ones for the migrated v2 parser+binder.
There was a problem hiding this comment.
Yeah, those are still Slang V1, so I think yes deleting them is the way to go.
c1adb8b to
ce185df
Compare
Added V2 benchmarks:
Notes:
CompilationUnit#1577 )weighted_poolas a benchmarked contract since it's not 0.8.x compatible, usingui_pool_data_provider_v3insteadTesting