[v2] add version validation for structured CST#1627
Conversation
|
4b69995 to
45a5e88
Compare
teofr
left a comment
There was a problem hiding this comment.
Just one small thing, and two questions
crates/codegen-v2/cst/src/structured_cst/versioned_descendants.rs
Outdated
Show resolved
Hide resolved
crates/solidity-v2/outputs/cargo/cst/src/structured_cst/text_range/mod.rs
Outdated
Show resolved
Hide resolved
...olidity-v2/outputs/cargo/cst/src/structured_cst/validation/validate_syntax_version.rs.jinja2
Show resolved
Hide resolved
| // TODO(v2): these tests should really go through 'CompilationUnit' once it is ready. | ||
| // This way, we won't have to call individual validation APIs. | ||
| // All errors should be collected during the compilation unit construction. |
There was a problem hiding this comment.
As of right now, the CompilationUnit will also perform semantic analysis on the parsed source, which is something we absolutely don't need for these snapshots. But it will also do other things not related to parsing, eg. resolution of imported paths. I think we may need an intermediate abstraction that both these tests and the CompilationUnit can use.
There was a problem hiding this comment.
I think we may need an intermediate abstraction that both these tests and the CompilationUnit can use.
And the benchmarks
There was a problem hiding this comment.
Agreed. Once your PRs are merged, I was thinking of adding two levels of operations (at least for now):
- Syntax: that includes parsing them, and doing any syntax-only validation (per file), like parse errors, versioning, pragma, import resolution. All of this doesn't need to build the AST or run the nano-passes yet.
- Semantic: that is running the nano-passes, and collecting compilation-wide diagnostics.
I will also think about how to combine/report validation errors across the board using a standard type/set of utils to serialize/render them. Please let me know if you have any suggestions in the meantime.
ggiraldez
left a comment
There was a problem hiding this comment.
Left one comment but looks good to me.
My only other concern is the text_range() functions returning an Option<> which intuitively doesn't feel right. I understand the reasons why, but from a user perspective I'd assume it returns an empty range, but located at the offset where the node would be.
| @@ -0,0 +1,14 @@ | |||
| #[path = "text_start.generated.rs"] | |||
There was a problem hiding this comment.
My only other concern is the
text_range()functions returning anOption<>which intuitively doesn't feel right. I understand the reasons why, but from a user perspective I'd assume it returns an empty range, but located at the offset where the node would be.
AFAIU, we are not exposing structured_cst or its utils to the user at all, so this is only internal. The call-sites are never expected to call it on an empty node (they already .expect() it).
If these assumptions ever change, I think we would need to introduce a cursor/stateful visitor to keep track of ancestry/outer ranges as well. But given that most future validation/operations will happen on the AST, I'm not sure if it is worth adding one now.
@ggiraldez Thoughts?
There was a problem hiding this comment.
You're right we're not exposing the CST to the user. I was thinking how can we use this information to translate it to the AST layer and then provide it to the user. But I agree it can be done in the IR builder by adding a bit of state.
In any case, the only nodes that can potentially be empty are the collection non-terminals, right? And I guess, transitively any other non-terminal that contains a single collection, ie. a choice or another collection. Am I missing any other case?
There was a problem hiding this comment.
There's a cleanish solution to this I was considering, but I'm not fully convinced it doesn't bring up other issues.
We could get rid of allow_empty in collections, and instead put that responsibility in the parent by using Optional(...) instead of Required(...). Then every collection is non-empty by definition.
It's not necessary, but it came up while solving #1654
There was a problem hiding this comment.
From the IR/AST point of view, it does make the data structures a tiny bit more cumbersome to work with (ie. needing to unwrap the Option). But it maybe something we can solve when building the IR.
There was a problem hiding this comment.
In any case, the only nodes that can potentially be empty are the collection non-terminals, right?
Yes. This would be None in the case of empty collections. And in that case we would never have an offending syntax to try to get the range for.
We could get rid of allow_empty in collections, and instead put that responsibility in the parent by using Optional(...) instead of Required(...). Then every collection is non-empty by definition.
it does make the data structures a tiny bit more cumbersome to work with
This is the reason it is enforced via through Errors::OptionalFieldAllowsEmpty, as it was much easier to deal with in all subsequent APIs.
I think it should be trivial to get complete ranges with a bit of state and some extra processing, but it is not needed for the CST so far.
45a5e88 to
840c163
Compare
No description provided.