Skip to content

[v2] Generate controlled IDs for IR nodes#1555

Open
ggiraldez wants to merge 4 commits intomainfrom
ggiraldez/v2-controlled-node-ids
Open

[v2] Generate controlled IDs for IR nodes#1555
ggiraldez wants to merge 4 commits intomainfrom
ggiraldez/v2-controlled-node-ids

Conversation

@ggiraldez
Copy link
Copy Markdown
Contributor

@ggiraldez ggiraldez commented Mar 17, 2026

This PR replaces pointer-based NodeId generation with deterministic IDs, provided by a new NodeIdGenerator. That means each node now stores its id.

Having stable and predictable IDs may help with debugging and more importantly would allow us to be able to identify the source file of IR nodes by assigning different ID ranges to each file in a CompilationUnit. Having stable IDs is also a requirement for solx integration.

@ggiraldez ggiraldez requested review from OmarTawfik and teofr March 17, 2026 21:11
@ggiraldez ggiraldez requested review from a team as code owners March 17, 2026 21:11
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 17, 2026

⚠️ No Changeset found

Latest commit: 0655b84

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@nebasuke nebasuke mentioned this pull request Mar 18, 2026
@@ -12,6 +12,12 @@ use super::source::Source;
#[repr(transparent)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggest using named fields, to replace the tuple access syntax id.0/id.1 with a more readable one id.value:

pub struct NodeId {
  value: usize
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. We shouldn't need to access the internal usize value for NodeId as it should be completely opaque. AFAIK it's common practice to use the tuple notation for creating new ID types from integers. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't (in external code), yes. But internally, I think .0/.1 syntax is very confusing.
This specific field is not used much outside this module, so I will leave it up to you if you prefer the current version.

Comment thread crates/solidity-v2/outputs/cargo/ir/src/ir/nodes.rs.jinja2
@ggiraldez ggiraldez changed the title Generate controlled IDs for IR nodes [v2] Generate controlled IDs for IR nodes Mar 18, 2026
github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2026
## Summary

Fixes duplicate CI runs on PRs and improves cache efficiency. Note that
due to me changing the caching behaviour it will temporarily break the
cache key until the next save on `main` push.



Every PR push was triggering CI **twice** (`push` + `pull_request`
events). The duplicate runs also created redundant cache entries,
putting the repo right at the 10 GB cache limit with only 15 entries.

### Changes

1. **Fix duplicate CI runs** — Restrict `push` trigger to `main` only
(used for cache refresh). PRs are fully covered by `pull_request`.

2. **Fix merge queue trigger** — The merge queue was working by
accident: it pushes to `gh-readonly-queue/main/pr-*` branches, which
happened to match the unfiltered `push: {}`. With `push` now restricted
to `main`, this would break. Added the proper `merge_group` event
trigger, which is [GitHub's recommended mechanism for merge queue CI
validation](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions).
**Without this fix, restricting the push trigger would silently break
the merge queue.**

3. **Only save cache on main push** — PR and merge queue runs restore
from main's cache via `restore-keys` fallback, so their saved entries
were redundant duplicates. Now only the dedicated main cache refresh run
saves.

4. **Cache cargo registry and pnpm store** — Add `~/.cargo/registry/`,
`~/.cargo/git/`, and `~/.pnpm-store/` to cached paths, speeding up
`cargo fetch` (557 crates) and `pnpm install` on cache hits.

### Cache analysis (before this PR)

**16 active cache entries, ~10.1 GB total** (at the 10 GB default
limit). Every entry is ~645 MB.

#### Categorized by source

| Category | Count | Size | Purpose | Useful? |
|----------|-------|------|---------|---------|
| **main** | 2 | 1.3 GB | Cache refresh on main (2 Cargo.lock versions)
| Yes |
| **PR merge refs** (`1556/merge`, etc.) | 6 | 3.9 GB | From
`pull_request` events | **No — redundant, restores from main** |
| **Push branch refs** (`ci/persist-credentials-false`, etc.) | 5 | 3.3
GB | From `push` events — **duplicates** of PR entries | **No — waste**
|
| **Merge queue** (`gh-readonly-queue/main/pr-*`) | 3 | 2.0 GB | Merge
queue validation runs | **No — never reused** |

#### Duplicate pairs from push + pull_request

| PR | PR event cache | Push event cache (duplicate) |
|----|---------------|------------------------------|
| #1556 | `cache-1556/merge-...` |
`cache-ci/persist-credentials-false-...` |
| #1547 | `cache-1547/merge-...` |
`cache-OmarTawfik/remove-validation-breaking-changes-...` |
| #1555 | `cache-1555/merge-...` |
`cache-ggiraldez/v2-controlled-node-ids-...` |
| #1553 | `cache-1553/merge-...` (×2) |
`cache-ggiraldez/v2-semantic-ir-...` (×2) |

#### Impact after this PR

| Scenario | Entries | Total size | Savings |
|----------|---------|------------|---------|
| **Status quo** | 16 | 10.1 GB | — |
| **After removing push duplicates** | 11 | 7.4 GB | -3.3 GB (33%) |
| **After also skipping PR + merge queue saves** | 2 | 1.3 GB | **-8.8
GB (87%)** |
| **After adding cargo registry + pnpm store** (~300 MB extra on main) |
2 | ~1.9 GB | Plenty of headroom |

### Future consideration

Caching `target/` (Cargo build artifacts) would be the biggest CI
speedup — avoids full rebuild of 60+ crates every run. This would
require cache entries of 2-5 GB each and potentially increasing the
cache limit. Left as a follow-up.
nebasuke added a commit that referenced this pull request Mar 21, 2026
## Summary

Fixes duplicate CI runs on PRs and improves cache efficiency. Note that
due to me changing the caching behaviour it will temporarily break the
cache key until the next save on `main` push.



Every PR push was triggering CI **twice** (`push` + `pull_request`
events). The duplicate runs also created redundant cache entries,
putting the repo right at the 10 GB cache limit with only 15 entries.

### Changes

1. **Fix duplicate CI runs** — Restrict `push` trigger to `main` only
(used for cache refresh). PRs are fully covered by `pull_request`.

2. **Fix merge queue trigger** — The merge queue was working by
accident: it pushes to `gh-readonly-queue/main/pr-*` branches, which
happened to match the unfiltered `push: {}`. With `push` now restricted
to `main`, this would break. Added the proper `merge_group` event
trigger, which is [GitHub's recommended mechanism for merge queue CI
validation](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions).
**Without this fix, restricting the push trigger would silently break
the merge queue.**

3. **Only save cache on main push** — PR and merge queue runs restore
from main's cache via `restore-keys` fallback, so their saved entries
were redundant duplicates. Now only the dedicated main cache refresh run
saves.

4. **Cache cargo registry and pnpm store** — Add `~/.cargo/registry/`,
`~/.cargo/git/`, and `~/.pnpm-store/` to cached paths, speeding up
`cargo fetch` (557 crates) and `pnpm install` on cache hits.

### Cache analysis (before this PR)

**16 active cache entries, ~10.1 GB total** (at the 10 GB default
limit). Every entry is ~645 MB.

#### Categorized by source

| Category | Count | Size | Purpose | Useful? |
|----------|-------|------|---------|---------|
| **main** | 2 | 1.3 GB | Cache refresh on main (2 Cargo.lock versions)
| Yes |
| **PR merge refs** (`1556/merge`, etc.) | 6 | 3.9 GB | From
`pull_request` events | **No — redundant, restores from main** |
| **Push branch refs** (`ci/persist-credentials-false`, etc.) | 5 | 3.3
GB | From `push` events — **duplicates** of PR entries | **No — waste**
|
| **Merge queue** (`gh-readonly-queue/main/pr-*`) | 3 | 2.0 GB | Merge
queue validation runs | **No — never reused** |

#### Duplicate pairs from push + pull_request

| PR | PR event cache | Push event cache (duplicate) |
|----|---------------|------------------------------|
| #1556 | `cache-1556/merge-...` |
`cache-ci/persist-credentials-false-...` |
| #1547 | `cache-1547/merge-...` |
`cache-OmarTawfik/remove-validation-breaking-changes-...` |
| #1555 | `cache-1555/merge-...` |
`cache-ggiraldez/v2-controlled-node-ids-...` |
| #1553 | `cache-1553/merge-...` (×2) |
`cache-ggiraldez/v2-semantic-ir-...` (×2) |

#### Impact after this PR

| Scenario | Entries | Total size | Savings |
|----------|---------|------------|---------|
| **Status quo** | 16 | 10.1 GB | — |
| **After removing push duplicates** | 11 | 7.4 GB | -3.3 GB (33%) |
| **After also skipping PR + merge queue saves** | 2 | 1.3 GB | **-8.8
GB (87%)** |
| **After adding cargo registry + pnpm store** (~300 MB extra on main) |
2 | ~1.9 GB | Plenty of headroom |

### Future consideration

Caching `target/` (Cargo build artifacts) would be the biggest CI
speedup — avoids full rebuild of 60+ crates every run. This would
require cache entries of 2-5 GB each and potentially increasing the
cache limit. Left as a follow-up.
Comment thread crates/solidity-v2/outputs/cargo/ir/src/ir/builder/mod.rs
Base automatically changed from ggiraldez/v2-semantic-ir to main March 24, 2026 16:03
@ggiraldez ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from 5d707ee to 235d62b Compare March 24, 2026 16:59
@ggiraldez ggiraldez marked this pull request as draft March 27, 2026 18:05
@ggiraldez
Copy link
Copy Markdown
Contributor Author

Moving back to draft because this is related to performance improvements and not a priority right now.

@ggiraldez ggiraldez changed the title [v2] Generate controlled IDs for IR nodes [WIP] [v2] Generate controlled IDs for IR nodes Apr 14, 2026
@ggiraldez ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from 235d62b to e4bd785 Compare April 14, 2026 21:10
@ggiraldez ggiraldez added the ci:perf Runs performance test dry-runs in a PR (rather than the smoke-tests) label Apr 14, 2026
@ggiraldez ggiraldez changed the title [WIP] [v2] Generate controlled IDs for IR nodes [v2] Generate controlled IDs for IR nodes Apr 14, 2026
@ggiraldez
Copy link
Copy Markdown
Contributor Author

Moving back to draft because this is related to performance improvements and not a priority right now.

Having stable NodeIds is now a requirement for the solx integration. This is a reimplementation on top of the current codebase.

@ggiraldez ggiraldez marked this pull request as ready for review April 14, 2026 21:13
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

🐰 Bencher Report

Branchggiraldez/v2-controlled-node-ids
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

🐰 Bencher Report

Branchggiraldez/v2-controlled-node-ids
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 23 Alerts

🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

🐰 Bencher Report

Branchggiraldez/v2-controlled-node-ids
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 30 Alerts

🐰 View full continuous benchmarking report in Bencher

@ggiraldez ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from e4bd785 to be62329 Compare April 14, 2026 22:11
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

🐰 Bencher Report

Branchggiraldez/v2-controlled-node-ids
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 7 Alerts

🐰 View full continuous benchmarking report in Bencher

use slang_solidity_v2_ir::ir::{self, SourceUnit, SourceUnitMember};
use slang_solidity_v2_ir::ir::{self, NodeIdGenerator, SourceUnit, SourceUnitMember};

use crate::dataset::SolidityProject;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the new perf alerts on this PR:

https://bencher.dev/perf/slang-dashboard-cargo-slang/reports/d2e2af85-6cb7-459c-ba46-19679123bc96?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr%2Bcomments&utm_term=slang-dashboard-cargo-slang

I see 6 alerts (regressions), 5 in slang v1, and 1 slang v2. Not sure how is that related to the changes here. I wonder if this is a bug in our alerts/perf setup? cc @teofr

pub struct NodeId(usize);

impl From<usize> for NodeId {
fn from(value: usize) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all crates will be dealing with NodeId, WDYT of moving it to the common crate instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. The NodeIds are necessary for identifying IR nodes and dependency-wise we could say the same about IR node types, say ContractDefinition: all other crates down the dependency chain will deal with the type in some way.

It is true that NodeIds will become part of the public facing API, but we can always re-export the type from slang_solidity_v2. I don't see a strong reason to make the move, but I'm willing to be convinced otherwise.


#[derive(Debug)]
pub struct {{ parent_type }}Struct {
pub(crate) id: NodeId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: renaming to node_id to match the rest of the API?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for other fields/getters in this file.

Copy link
Copy Markdown
Contributor Author

@ggiraldez ggiraldez Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had Claude make this change (there's a lot of existing references through the getters) and I don't like the result, so I'm gonna push back a bit.

The NodeId is the ID of the IR nodes. Adding the node_ prefix everywhere makes the code more verbose and redundant. I see a lot of ir_node.node_id() in the binder, for example. I think using the explicit node_id() on types which are not nodes make perfect sense though, but I think we're already doing it. We're also doing it for AST nodes, and I'm now thinking about changing the getter to id(). I know clippy has a warning when you use the name of the type as a prefix in fields. This is not quite that case, but semantically it's similar.

Happy to discuss further of course! I'll save the commit somewhere because it took a lot of tokens 😬

builder.build_source_unit(source_unit)
}

struct CstToIrBuilder<'a, S: Source> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking of course:
I wonder why Builder is a trait? do we expect multiple implementations of it in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any valid reason and I don't think we will see other implementations of it. I'll change this.

use crate::ir::nodes as output;

/// A strictly monotonically increasing `NodeId` generator.
pub struct NodeIdGenerator {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the expectation is to use a single generator for an entire CompilationUnit, I wonder why allow it to accept a custom initial ID?
NodeIdGenerator::new() is never called. Should we remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it may be useful for building the IR tress in parallel, related to your other comment. In that case we may want to have multiple generators starting at different sequence numbers. For now though, it makes no sense, so I'll remove it.


let source_unit_cst = Parser::parse(contents, self.language_version)?;
let source_unit = ir::build(&source_unit_cst, &contents);
let source_unit = ir::build(&source_unit_cst, &contents, &mut self.id_generator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this API allows multithreading, imports will be read/parsed/added in different orders between runs, which might cause the IDs to go out of sync. Thoughts?

I wonder if should move creating the IR trees to .build(), to sort all files deterministically first (by FileId?).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Using the same generator from multiple threads will not be possible either, so we will need to know beforehand how to partition the ID space to hand it over to the builder of each file. I'll move the IR trees building into the build() method.

@ggiraldez ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from be62329 to 6a1b8ec Compare April 16, 2026 21:23
This means that we need to keep the `NodeId` as part of the IR node structures
instead of creating it from the memory address of the `Rc<>` allocation.
@ggiraldez ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from 6a1b8ec to 08d3c82 Compare April 16, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:perf Runs performance test dry-runs in a PR (rather than the smoke-tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants