[v2] Generate controlled IDs for IR nodes by ggiraldez · Pull Request #1555 · NomicFoundation/slang

ggiraldez · 2026-03-17T21:11:01Z

This PR replaces pointer-based NodeId generation with deterministic IDs, provided by a new NodeIdGenerator. That means each node now stores its id.

Having stable and predictable IDs may help with debugging and more importantly would allow us to be able to identify the source file of IR nodes by assigning different ID ranges to each file in a CompilationUnit. Having stable IDs is also a requirement for solx integration.

changeset-bot · 2026-03-17T21:11:07Z

⚠️ No Changeset found

Latest commit: 0655b84

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

OmarTawfik · 2026-03-18T16:31:04Z

@@ -12,6 +12,12 @@ use super::source::Source;
 #[repr(transparent)]


nit: I suggest using named fields, to replace the tuple access syntax id.0/id.1 with a more readable one id.value:

pub struct NodeId { value: usize }

I'm not sure about this. We shouldn't need to access the internal usize value for NodeId as it should be completely opaque. AFAIK it's common practice to use the tuple notation for creating new ID types from integers. WDYT?

We shouldn't (in external code), yes. But internally, I think .0/.1 syntax is very confusing.
This specific field is not used much outside this module, so I will leave it up to you if you prefer the current version.

## Summary Fixes duplicate CI runs on PRs and improves cache efficiency. Note that due to me changing the caching behaviour it will temporarily break the cache key until the next save on `main` push. Every PR push was triggering CI **twice** (`push` + `pull_request` events). The duplicate runs also created redundant cache entries, putting the repo right at the 10 GB cache limit with only 15 entries. ### Changes 1. **Fix duplicate CI runs** — Restrict `push` trigger to `main` only (used for cache refresh). PRs are fully covered by `pull_request`. 2. **Fix merge queue trigger** — The merge queue was working by accident: it pushes to `gh-readonly-queue/main/pr-*` branches, which happened to match the unfiltered `push: {}`. With `push` now restricted to `main`, this would break. Added the proper `merge_group` event trigger, which is [GitHub's recommended mechanism for merge queue CI validation](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions). **Without this fix, restricting the push trigger would silently break the merge queue.** 3. **Only save cache on main push** — PR and merge queue runs restore from main's cache via `restore-keys` fallback, so their saved entries were redundant duplicates. Now only the dedicated main cache refresh run saves. 4. **Cache cargo registry and pnpm store** — Add `~/.cargo/registry/`, `~/.cargo/git/`, and `~/.pnpm-store/` to cached paths, speeding up `cargo fetch` (557 crates) and `pnpm install` on cache hits. ### Cache analysis (before this PR) **16 active cache entries, ~10.1 GB total** (at the 10 GB default limit). Every entry is ~645 MB. #### Categorized by source | Category | Count | Size | Purpose | Useful? | |----------|-------|------|---------|---------| | **main** | 2 | 1.3 GB | Cache refresh on main (2 Cargo.lock versions) | Yes | | **PR merge refs** (`1556/merge`, etc.) | 6 | 3.9 GB | From `pull_request` events | **No — redundant, restores from main** | | **Push branch refs** (`ci/persist-credentials-false`, etc.) | 5 | 3.3 GB | From `push` events — **duplicates** of PR entries | **No — waste** | | **Merge queue** (`gh-readonly-queue/main/pr-*`) | 3 | 2.0 GB | Merge queue validation runs | **No — never reused** | #### Duplicate pairs from push + pull_request | PR | PR event cache | Push event cache (duplicate) | |----|---------------|------------------------------| | #1556 | `cache-1556/merge-...` | `cache-ci/persist-credentials-false-...` | | #1547 | `cache-1547/merge-...` | `cache-OmarTawfik/remove-validation-breaking-changes-...` | | #1555 | `cache-1555/merge-...` | `cache-ggiraldez/v2-controlled-node-ids-...` | | #1553 | `cache-1553/merge-...` (×2) | `cache-ggiraldez/v2-semantic-ir-...` (×2) | #### Impact after this PR | Scenario | Entries | Total size | Savings | |----------|---------|------------|---------| | **Status quo** | 16 | 10.1 GB | — | | **After removing push duplicates** | 11 | 7.4 GB | -3.3 GB (33%) | | **After also skipping PR + merge queue saves** | 2 | 1.3 GB | **-8.8 GB (87%)** | | **After adding cargo registry + pnpm store** (~300 MB extra on main) | 2 | ~1.9 GB | Plenty of headroom | ### Future consideration Caching `target/` (Cargo build artifacts) would be the biggest CI speedup — avoids full rebuild of 60+ crates every run. This would require cache entries of 2-5 GB each and potentially increasing the cache limit. Left as a follow-up.

ggiraldez · 2026-03-27T18:06:08Z

Moving back to draft because this is related to performance improvements and not a priority right now.

ggiraldez · 2026-04-14T21:13:49Z

Moving back to draft because this is related to performance improvements and not a priority right now.

Having stable NodeIds is now a requirement for the solx integration. This is a reimplementation on top of the current codebase.

github-actions · 2026-04-14T21:22:48Z

Bencher Report

Branch	ggiraldez/v2-controlled-node-ids
Testbed	ci

⚠️ WARNING: Truncated view!
The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-04-14T21:23:38Z

Bencher Report

Branch	ggiraldez/v2-controlled-node-ids
Testbed	ci

⚠️ WARNING: Truncated view!
The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 23 Alerts

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-04-14T22:07:47Z

Bencher Report

Branch	ggiraldez/v2-controlled-node-ids
Testbed	ci

⚠️ WARNING: Truncated view!
The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 30 Alerts

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-04-15T00:25:35Z

Bencher Report

Branch	ggiraldez/v2-controlled-node-ids
Testbed	ci

⚠️ WARNING: Truncated view!
The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🚨 7 Alerts

🐰 View full continuous benchmarking report in Bencher

OmarTawfik · 2026-04-15T19:54:18Z

-use slang_solidity_v2_ir::ir::{self, SourceUnit, SourceUnitMember};
+use slang_solidity_v2_ir::ir::{self, NodeIdGenerator, SourceUnit, SourceUnitMember};

 use crate::dataset::SolidityProject;


Looking at the new perf alerts on this PR:

https://bencher.dev/perf/slang-dashboard-cargo-slang/reports/d2e2af85-6cb7-459c-ba46-19679123bc96?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr%2Bcomments&utm_term=slang-dashboard-cargo-slang

I see 6 alerts (regressions), 5 in slang v1, and 1 slang v2. Not sure how is that related to the changes here. I wonder if this is a bug in our alerts/perf setup? cc @teofr

OmarTawfik · 2026-04-15T19:55:17Z

 pub struct NodeId(usize);

+impl From<usize> for NodeId {
+    fn from(value: usize) -> Self {


since all crates will be dealing with NodeId, WDYT of moving it to the common crate instead?

I'm not sure about this. The NodeIds are necessary for identifying IR nodes and dependency-wise we could say the same about IR node types, say ContractDefinition: all other crates down the dependency chain will deal with the type in some way.

It is true that NodeIds will become part of the public facing API, but we can always re-export the type from slang_solidity_v2. I don't see a strong reason to make the move, but I'm willing to be convinced otherwise.

OmarTawfik · 2026-04-15T19:56:30Z


  #[derive(Debug)]
  pub struct {{ parent_type }}Struct {
+    pub(crate) id: NodeId,


nit: renaming to node_id to match the rest of the API?

Same for other fields/getters in this file.

I had Claude make this change (there's a lot of existing references through the getters) and I don't like the result, so I'm gonna push back a bit.

The NodeId is the ID of the IR nodes. Adding the node_ prefix everywhere makes the code more verbose and redundant. I see a lot of ir_node.node_id() in the binder, for example. I think using the explicit node_id() on types which are not nodes make perfect sense though, but I think we're already doing it. We're also doing it for AST nodes, and I'm now thinking about changing the getter to id(). I know clippy has a warning when you use the name of the type as a prefix in fields. This is not quite that case, but semantically it's similar.

Happy to discuss further of course! I'll save the commit somewhere because it took a lot of tokens 😬

OmarTawfik · 2026-04-15T20:04:53Z

    builder.build_source_unit(source_unit)
 }

 struct CstToIrBuilder<'a, S: Source> {


not blocking of course:
I wonder why Builder is a trait? do we expect multiple implementations of it in the future?

I don't think there's any valid reason and I don't think we will see other implementations of it. I'll change this.

OmarTawfik · 2026-04-15T20:07:18Z

 use crate::ir::nodes as output;

+/// A strictly monotonically increasing `NodeId` generator.
+pub struct NodeIdGenerator {


If the expectation is to use a single generator for an entire CompilationUnit, I wonder why allow it to accept a custom initial ID?
NodeIdGenerator::new() is never called. Should we remove it?

I'm thinking it may be useful for building the IR tress in parallel, related to your other comment. In that case we may want to have multiple generators starting at different sequence numbers. For now though, it makes no sense, so I'll remove it.

OmarTawfik · 2026-04-15T20:14:19Z


        let source_unit_cst = Parser::parse(contents, self.language_version)?;
-        let source_unit = ir::build(&source_unit_cst, &contents);
+        let source_unit = ir::build(&source_unit_cst, &contents, &mut self.id_generator);


when this API allows multithreading, imports will be read/parsed/added in different orders between runs, which might cause the IDs to go out of sync. Thoughts?

I wonder if should move creating the IR trees to .build(), to sort all files deterministically first (by FileId?).

That's a good point. Using the same generator from multiple threads will not be possible either, so we will need to know beforehand how to partition the ID space to hand it over to the builder of each file. I'll move the IR trees building into the build() method.

This means that we need to keep the `NodeId` as part of the IR node structures instead of creating it from the memory address of the `Rc<>` allocation.

This will allow easier paralellization of parsing (and possibly IR building) in the future.

ggiraldez requested review from OmarTawfik and teofr March 17, 2026 21:11

ggiraldez requested review from a team as code owners March 17, 2026 21:11

nebasuke mentioned this pull request Mar 18, 2026

Fix/ci duplicate runs #1557

Merged

OmarTawfik reviewed Mar 18, 2026

View reviewed changes

ggiraldez changed the title ~~Generate controlled IDs for IR nodes~~ [v2] Generate controlled IDs for IR nodes Mar 18, 2026

teofr reviewed Mar 23, 2026

View reviewed changes

Comment thread crates/solidity-v2/outputs/cargo/ir/src/ir/builder/mod.rs

Comment thread crates/solidity-v2/outputs/cargo/ir/src/ir/builder/default.rs.jinja2

Base automatically changed from ggiraldez/v2-semantic-ir to main March 24, 2026 16:03

ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from 5d707ee to 235d62b Compare March 24, 2026 16:59

ggiraldez marked this pull request as draft March 27, 2026 18:05

ggiraldez changed the title ~~[v2] Generate controlled IDs for IR nodes~~ [WIP] [v2] Generate controlled IDs for IR nodes Apr 14, 2026

ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from 235d62b to e4bd785 Compare April 14, 2026 21:10

ggiraldez added the ci:perf Runs performance test dry-runs in a PR (rather than the smoke-tests) label Apr 14, 2026

ggiraldez changed the title ~~[WIP] [v2] Generate controlled IDs for IR nodes~~ [v2] Generate controlled IDs for IR nodes Apr 14, 2026

ggiraldez marked this pull request as ready for review April 14, 2026 21:13

ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from e4bd785 to be62329 Compare April 14, 2026 22:11

OmarTawfik requested changes Apr 15, 2026

View reviewed changes

ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from be62329 to 6a1b8ec Compare April 16, 2026 21:23

Generate controllable NodeIds for IR nodes

08d3c82

This means that we need to keep the `NodeId` as part of the IR node structures instead of creating it from the memory address of the `Rc<>` allocation.

ggiraldez force-pushed the ggiraldez/v2-controlled-node-ids branch from 6a1b8ec to 08d3c82 Compare April 16, 2026 21:30

ggiraldez added 3 commits April 16, 2026 19:34

Remove NodeIdGenerator::new()

cf32d88

Create IR trees in build() instead of in add_file()

4627191

This will allow easier paralellization of parsing (and possibly IR building) in the future.

Don't generate a trait for the IR builder, it's not necessary

0655b84

		@@ -12,6 +12,12 @@ use super::source::Source;
		#[repr(transparent)]

Conversation

ggiraldez commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggiraldez commented Mar 27, 2026

Uh oh!

ggiraldez commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ WARNING: Truncated view!

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ WARNING: Truncated view!

🚨 23 Alerts

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ WARNING: Truncated view!

🚨 30 Alerts

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ WARNING: Truncated view!

🚨 7 Alerts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggiraldez Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggiraldez commented Mar 17, 2026 •

edited

Loading

changeset-bot bot commented Mar 17, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 15, 2026 •

edited

Loading

ggiraldez Apr 16, 2026 •

edited

Loading