Skip to content

[v2] add version validation for structured CST#1627

Merged
OmarTawfik merged 2 commits intomainfrom
OmarTawfik/validate-syntax-version
Apr 9, 2026
Merged

[v2] add version validation for structured CST#1627
OmarTawfik merged 2 commits intomainfrom
OmarTawfik/validate-syntax-version

Conversation

@OmarTawfik
Copy link
Copy Markdown
Contributor

No description provided.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 7, 2026

⚠️ No Changeset found

Latest commit: 840c163

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@OmarTawfik OmarTawfik force-pushed the OmarTawfik/validate-syntax-version branch from 4b69995 to 45a5e88 Compare April 7, 2026 14:29
@OmarTawfik OmarTawfik marked this pull request as ready for review April 7, 2026 14:47
@OmarTawfik OmarTawfik requested review from a team as code owners April 7, 2026 14:47
Copy link
Copy Markdown
Contributor

@teofr teofr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small thing, and two questions

Comment on lines +36 to +38
// TODO(v2): these tests should really go through 'CompilationUnit' once it is ready.
// This way, we won't have to call individual validation APIs.
// All errors should be collected during the compilation unit construction.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of right now, the CompilationUnit will also perform semantic analysis on the parsed source, which is something we absolutely don't need for these snapshots. But it will also do other things not related to parsing, eg. resolution of imported paths. I think we may need an intermediate abstraction that both these tests and the CompilationUnit can use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need an intermediate abstraction that both these tests and the CompilationUnit can use.

And the benchmarks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Once your PRs are merged, I was thinking of adding two levels of operations (at least for now):

  • Syntax: that includes parsing them, and doing any syntax-only validation (per file), like parse errors, versioning, pragma, import resolution. All of this doesn't need to build the AST or run the nano-passes yet.
  • Semantic: that is running the nano-passes, and collecting compilation-wide diagnostics.

I will also think about how to combine/report validation errors across the board using a standard type/set of utils to serialize/render them. Please let me know if you have any suggestions in the meantime.

Copy link
Copy Markdown
Contributor

@ggiraldez ggiraldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment but looks good to me.

My only other concern is the text_range() functions returning an Option<> which intuitively doesn't feel right. I understand the reasons why, but from a user perspective I'd assume it returns an empty range, but located at the offset where the node would be.

@@ -0,0 +1,14 @@
#[path = "text_start.generated.rs"]
Copy link
Copy Markdown
Contributor Author

@OmarTawfik OmarTawfik Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only other concern is the text_range() functions returning an Option<> which intuitively doesn't feel right. I understand the reasons why, but from a user perspective I'd assume it returns an empty range, but located at the offset where the node would be.

AFAIU, we are not exposing structured_cst or its utils to the user at all, so this is only internal. The call-sites are never expected to call it on an empty node (they already .expect() it).

If these assumptions ever change, I think we would need to introduce a cursor/stateful visitor to keep track of ancestry/outer ranges as well. But given that most future validation/operations will happen on the AST, I'm not sure if it is worth adding one now.

@ggiraldez Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right we're not exposing the CST to the user. I was thinking how can we use this information to translate it to the AST layer and then provide it to the user. But I agree it can be done in the IR builder by adding a bit of state.

In any case, the only nodes that can potentially be empty are the collection non-terminals, right? And I guess, transitively any other non-terminal that contains a single collection, ie. a choice or another collection. Am I missing any other case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a cleanish solution to this I was considering, but I'm not fully convinced it doesn't bring up other issues.

We could get rid of allow_empty in collections, and instead put that responsibility in the parent by using Optional(...) instead of Required(...). Then every collection is non-empty by definition.

It's not necessary, but it came up while solving #1654

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the IR/AST point of view, it does make the data structures a tiny bit more cumbersome to work with (ie. needing to unwrap the Option). But it maybe something we can solve when building the IR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the only nodes that can potentially be empty are the collection non-terminals, right?

Yes. This would be None in the case of empty collections. And in that case we would never have an offending syntax to try to get the range for.

We could get rid of allow_empty in collections, and instead put that responsibility in the parent by using Optional(...) instead of Required(...). Then every collection is non-empty by definition.

it does make the data structures a tiny bit more cumbersome to work with

This is the reason it is enforced via through Errors::OptionalFieldAllowsEmpty, as it was much easier to deal with in all subsequent APIs.

I think it should be trivial to get complete ranges with a bit of state and some extra processing, but it is not needed for the CST so far.

@OmarTawfik OmarTawfik force-pushed the OmarTawfik/validate-syntax-version branch from 45a5e88 to 840c163 Compare April 9, 2026 11:26
@OmarTawfik OmarTawfik enabled auto-merge April 9, 2026 11:26
@OmarTawfik OmarTawfik disabled auto-merge April 9, 2026 11:27
@OmarTawfik OmarTawfik requested a review from teofr April 9, 2026 11:27
Copy link
Copy Markdown
Contributor

@teofr teofr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@OmarTawfik OmarTawfik added this pull request to the merge queue Apr 9, 2026
Merged via the queue into main with commit 91f575d Apr 9, 2026
16 of 18 checks passed
@OmarTawfik OmarTawfik deleted the OmarTawfik/validate-syntax-version branch April 9, 2026 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants