Conversation
|
| @@ -12,6 +12,12 @@ use super::source::Source; | |||
| #[repr(transparent)] | |||
There was a problem hiding this comment.
nit: I suggest using named fields, to replace the tuple access syntax id.0/id.1 with a more readable one id.value:
pub struct NodeId {
value: usize
}There was a problem hiding this comment.
I'm not sure about this. We shouldn't need to access the internal usize value for NodeId as it should be completely opaque. AFAIK it's common practice to use the tuple notation for creating new ID types from integers. WDYT?
There was a problem hiding this comment.
We shouldn't (in external code), yes. But internally, I think .0/.1 syntax is very confusing.
This specific field is not used much outside this module, so I will leave it up to you if you prefer the current version.
## Summary
Fixes duplicate CI runs on PRs and improves cache efficiency. Note that
due to me changing the caching behaviour it will temporarily break the
cache key until the next save on `main` push.
Every PR push was triggering CI **twice** (`push` + `pull_request`
events). The duplicate runs also created redundant cache entries,
putting the repo right at the 10 GB cache limit with only 15 entries.
### Changes
1. **Fix duplicate CI runs** — Restrict `push` trigger to `main` only
(used for cache refresh). PRs are fully covered by `pull_request`.
2. **Fix merge queue trigger** — The merge queue was working by
accident: it pushes to `gh-readonly-queue/main/pr-*` branches, which
happened to match the unfiltered `push: {}`. With `push` now restricted
to `main`, this would break. Added the proper `merge_group` event
trigger, which is [GitHub's recommended mechanism for merge queue CI
validation](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions).
**Without this fix, restricting the push trigger would silently break
the merge queue.**
3. **Only save cache on main push** — PR and merge queue runs restore
from main's cache via `restore-keys` fallback, so their saved entries
were redundant duplicates. Now only the dedicated main cache refresh run
saves.
4. **Cache cargo registry and pnpm store** — Add `~/.cargo/registry/`,
`~/.cargo/git/`, and `~/.pnpm-store/` to cached paths, speeding up
`cargo fetch` (557 crates) and `pnpm install` on cache hits.
### Cache analysis (before this PR)
**16 active cache entries, ~10.1 GB total** (at the 10 GB default
limit). Every entry is ~645 MB.
#### Categorized by source
| Category | Count | Size | Purpose | Useful? |
|----------|-------|------|---------|---------|
| **main** | 2 | 1.3 GB | Cache refresh on main (2 Cargo.lock versions)
| Yes |
| **PR merge refs** (`1556/merge`, etc.) | 6 | 3.9 GB | From
`pull_request` events | **No — redundant, restores from main** |
| **Push branch refs** (`ci/persist-credentials-false`, etc.) | 5 | 3.3
GB | From `push` events — **duplicates** of PR entries | **No — waste**
|
| **Merge queue** (`gh-readonly-queue/main/pr-*`) | 3 | 2.0 GB | Merge
queue validation runs | **No — never reused** |
#### Duplicate pairs from push + pull_request
| PR | PR event cache | Push event cache (duplicate) |
|----|---------------|------------------------------|
| #1556 | `cache-1556/merge-...` |
`cache-ci/persist-credentials-false-...` |
| #1547 | `cache-1547/merge-...` |
`cache-OmarTawfik/remove-validation-breaking-changes-...` |
| #1555 | `cache-1555/merge-...` |
`cache-ggiraldez/v2-controlled-node-ids-...` |
| #1553 | `cache-1553/merge-...` (×2) |
`cache-ggiraldez/v2-semantic-ir-...` (×2) |
#### Impact after this PR
| Scenario | Entries | Total size | Savings |
|----------|---------|------------|---------|
| **Status quo** | 16 | 10.1 GB | — |
| **After removing push duplicates** | 11 | 7.4 GB | -3.3 GB (33%) |
| **After also skipping PR + merge queue saves** | 2 | 1.3 GB | **-8.8
GB (87%)** |
| **After adding cargo registry + pnpm store** (~300 MB extra on main) |
2 | ~1.9 GB | Plenty of headroom |
### Future consideration
Caching `target/` (Cargo build artifacts) would be the biggest CI
speedup — avoids full rebuild of 60+ crates every run. This would
require cache entries of 2-5 GB each and potentially increasing the
cache limit. Left as a follow-up.
## Summary
Fixes duplicate CI runs on PRs and improves cache efficiency. Note that
due to me changing the caching behaviour it will temporarily break the
cache key until the next save on `main` push.
Every PR push was triggering CI **twice** (`push` + `pull_request`
events). The duplicate runs also created redundant cache entries,
putting the repo right at the 10 GB cache limit with only 15 entries.
### Changes
1. **Fix duplicate CI runs** — Restrict `push` trigger to `main` only
(used for cache refresh). PRs are fully covered by `pull_request`.
2. **Fix merge queue trigger** — The merge queue was working by
accident: it pushes to `gh-readonly-queue/main/pr-*` branches, which
happened to match the unfiltered `push: {}`. With `push` now restricted
to `main`, this would break. Added the proper `merge_group` event
trigger, which is [GitHub's recommended mechanism for merge queue CI
validation](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions).
**Without this fix, restricting the push trigger would silently break
the merge queue.**
3. **Only save cache on main push** — PR and merge queue runs restore
from main's cache via `restore-keys` fallback, so their saved entries
were redundant duplicates. Now only the dedicated main cache refresh run
saves.
4. **Cache cargo registry and pnpm store** — Add `~/.cargo/registry/`,
`~/.cargo/git/`, and `~/.pnpm-store/` to cached paths, speeding up
`cargo fetch` (557 crates) and `pnpm install` on cache hits.
### Cache analysis (before this PR)
**16 active cache entries, ~10.1 GB total** (at the 10 GB default
limit). Every entry is ~645 MB.
#### Categorized by source
| Category | Count | Size | Purpose | Useful? |
|----------|-------|------|---------|---------|
| **main** | 2 | 1.3 GB | Cache refresh on main (2 Cargo.lock versions)
| Yes |
| **PR merge refs** (`1556/merge`, etc.) | 6 | 3.9 GB | From
`pull_request` events | **No — redundant, restores from main** |
| **Push branch refs** (`ci/persist-credentials-false`, etc.) | 5 | 3.3
GB | From `push` events — **duplicates** of PR entries | **No — waste**
|
| **Merge queue** (`gh-readonly-queue/main/pr-*`) | 3 | 2.0 GB | Merge
queue validation runs | **No — never reused** |
#### Duplicate pairs from push + pull_request
| PR | PR event cache | Push event cache (duplicate) |
|----|---------------|------------------------------|
| #1556 | `cache-1556/merge-...` |
`cache-ci/persist-credentials-false-...` |
| #1547 | `cache-1547/merge-...` |
`cache-OmarTawfik/remove-validation-breaking-changes-...` |
| #1555 | `cache-1555/merge-...` |
`cache-ggiraldez/v2-controlled-node-ids-...` |
| #1553 | `cache-1553/merge-...` (×2) |
`cache-ggiraldez/v2-semantic-ir-...` (×2) |
#### Impact after this PR
| Scenario | Entries | Total size | Savings |
|----------|---------|------------|---------|
| **Status quo** | 16 | 10.1 GB | — |
| **After removing push duplicates** | 11 | 7.4 GB | -3.3 GB (33%) |
| **After also skipping PR + merge queue saves** | 2 | 1.3 GB | **-8.8
GB (87%)** |
| **After adding cargo registry + pnpm store** (~300 MB extra on main) |
2 | ~1.9 GB | Plenty of headroom |
### Future consideration
Caching `target/` (Cargo build artifacts) would be the biggest CI
speedup — avoids full rebuild of 60+ crates every run. This would
require cache entries of 2-5 GB each and potentially increasing the
cache limit. Left as a follow-up.
5d707ee to
235d62b
Compare
|
Moving back to draft because this is related to performance improvements and not a priority right now. |
235d62b to
e4bd785
Compare
Having stable |
|
| Branch | ggiraldez/v2-controlled-node-ids |
| Testbed | ci |
🐰 View full continuous benchmarking report in Bencher
⚠️ WARNING: Truncated view!The full continuous benchmarking report exceeds the maximum length allowed on this platform.
|
| Branch | ggiraldez/v2-controlled-node-ids |
| Testbed | ci |
⚠️ WARNING: Truncated view!The full continuous benchmarking report exceeds the maximum length allowed on this platform.
🚨 23 Alerts
🐰 View full continuous benchmarking report in Bencher
|
| Branch | ggiraldez/v2-controlled-node-ids |
| Testbed | ci |
⚠️ WARNING: Truncated view!The full continuous benchmarking report exceeds the maximum length allowed on this platform.
🚨 30 Alerts
🐰 View full continuous benchmarking report in Benchere4bd785 to
be62329
Compare
|
| Branch | ggiraldez/v2-controlled-node-ids |
| Testbed | ci |
⚠️ WARNING: Truncated view!The full continuous benchmarking report exceeds the maximum length allowed on this platform.
🚨 7 Alerts
🐰 View full continuous benchmarking report in Bencher| use slang_solidity_v2_ir::ir::{self, SourceUnit, SourceUnitMember}; | ||
| use slang_solidity_v2_ir::ir::{self, NodeIdGenerator, SourceUnit, SourceUnitMember}; | ||
|
|
||
| use crate::dataset::SolidityProject; |
There was a problem hiding this comment.
Looking at the new perf alerts on this PR:
I see 6 alerts (regressions), 5 in slang v1, and 1 slang v2. Not sure how is that related to the changes here. I wonder if this is a bug in our alerts/perf setup? cc @teofr
| pub struct NodeId(usize); | ||
|
|
||
| impl From<usize> for NodeId { | ||
| fn from(value: usize) -> Self { |
There was a problem hiding this comment.
since all crates will be dealing with NodeId, WDYT of moving it to the common crate instead?
There was a problem hiding this comment.
I'm not sure about this. The NodeIds are necessary for identifying IR nodes and dependency-wise we could say the same about IR node types, say ContractDefinition: all other crates down the dependency chain will deal with the type in some way.
It is true that NodeIds will become part of the public facing API, but we can always re-export the type from slang_solidity_v2. I don't see a strong reason to make the move, but I'm willing to be convinced otherwise.
|
|
||
| #[derive(Debug)] | ||
| pub struct {{ parent_type }}Struct { | ||
| pub(crate) id: NodeId, |
There was a problem hiding this comment.
nit: renaming to node_id to match the rest of the API?
There was a problem hiding this comment.
Same for other fields/getters in this file.
There was a problem hiding this comment.
I had Claude make this change (there's a lot of existing references through the getters) and I don't like the result, so I'm gonna push back a bit.
The NodeId is the ID of the IR nodes. Adding the node_ prefix everywhere makes the code more verbose and redundant. I see a lot of ir_node.node_id() in the binder, for example. I think using the explicit node_id() on types which are not nodes make perfect sense though, but I think we're already doing it. We're also doing it for AST nodes, and I'm now thinking about changing the getter to id(). I know clippy has a warning when you use the name of the type as a prefix in fields. This is not quite that case, but semantically it's similar.
Happy to discuss further of course! I'll save the commit somewhere because it took a lot of tokens 😬
| builder.build_source_unit(source_unit) | ||
| } | ||
|
|
||
| struct CstToIrBuilder<'a, S: Source> { |
There was a problem hiding this comment.
not blocking of course:
I wonder why Builder is a trait? do we expect multiple implementations of it in the future?
There was a problem hiding this comment.
I don't think there's any valid reason and I don't think we will see other implementations of it. I'll change this.
| use crate::ir::nodes as output; | ||
|
|
||
| /// A strictly monotonically increasing `NodeId` generator. | ||
| pub struct NodeIdGenerator { |
There was a problem hiding this comment.
If the expectation is to use a single generator for an entire CompilationUnit, I wonder why allow it to accept a custom initial ID?
NodeIdGenerator::new() is never called. Should we remove it?
There was a problem hiding this comment.
I'm thinking it may be useful for building the IR tress in parallel, related to your other comment. In that case we may want to have multiple generators starting at different sequence numbers. For now though, it makes no sense, so I'll remove it.
|
|
||
| let source_unit_cst = Parser::parse(contents, self.language_version)?; | ||
| let source_unit = ir::build(&source_unit_cst, &contents); | ||
| let source_unit = ir::build(&source_unit_cst, &contents, &mut self.id_generator); |
There was a problem hiding this comment.
when this API allows multithreading, imports will be read/parsed/added in different orders between runs, which might cause the IDs to go out of sync. Thoughts?
I wonder if should move creating the IR trees to .build(), to sort all files deterministically first (by FileId?).
There was a problem hiding this comment.
That's a good point. Using the same generator from multiple threads will not be possible either, so we will need to know beforehand how to partition the ID space to hand it over to the builder of each file. I'll move the IR trees building into the build() method.
be62329 to
6a1b8ec
Compare
This means that we need to keep the `NodeId` as part of the IR node structures instead of creating it from the memory address of the `Rc<>` allocation.
6a1b8ec to
08d3c82
Compare
This will allow easier paralellization of parsing (and possibly IR building) in the future.
This PR replaces pointer-based
NodeIdgeneration with deterministic IDs, provided by a newNodeIdGenerator. That means each node now stores its id.Having stable and predictable IDs may help with debugging and more importantly would allow us to be able to identify the source file of IR nodes by assigning different ID ranges to each file in a
CompilationUnit. Having stable IDs is also a requirement for solx integration.