Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually control all newline characters #139

Merged
merged 14 commits into from
Nov 16, 2024
Merged

Manually control all newline characters #139

merged 14 commits into from
Nov 16, 2024

Conversation

blindFS
Copy link
Contributor

@blindFS blindFS commented Nov 12, 2024

While attempting to fix #122 , I found the ambiguity of newline characters is a big obstacle for multiline expressions to be parsed accurately.

Current grammar put newline characters in extras (\s), allowing arbitrary numbers of them to be present freely between any nodes. While some rules specifically announced newline characters, making them selected with higher priorities when a newline is matched in the exact sequence.

This makes adding the rule for expr_binary_parenthesized extremely difficult since newline characters breaks the precedence order.

This PR did quite a lot changes:

  1. remove newline from extras
  2. manually add newline to related rules
  3. several existing test cases are broken, but the new results seem to be more accurate, so I changed the target results
  • most of them is about the position of comments
  • val_record as register signature
  1. more multiline related test cases added

⚠️ There are probably new errors introduced by this PR!
Please take time to review it with caution.
I'll continue to test it on more nushell scripts to make it more robust.

If you guys have better solutions, I'm more than happy to switch.

manually control all newline characters to reduce ambiguity for parenthesized expressions
(stmt_register
(val_variable
(identifier))))
(val_string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    stmt_register: ($) =>
      prec.left(
        -1,
        seq(
          KEYWORD().register,
          field(
            "plugin",
            choice(alias($.unquoted, $.val_string), $.val_variable),
          ),
          field("signature", optional($.val_record)),
        ),
      ),

The new result is more accurate for the rule

@fdncred
Copy link
Collaborator

fdncred commented Nov 12, 2024

I really don't have a problem moving forward with this because even if it breaks something, you've been super helpful to fix those type of things. If this is going to make things better in the long run, a little temporary instability is a small price to pay.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 13, 2024

@fdncred I think this is helpful to decouple intertwined rules and enable finer grained control of the grammar, on the other side, it forces us to write more intricate rules when newline characters are allowed in them.

Let's wait to see whether other people @CabalCrow @mrdgo have different opinions on this, during the on-going test on a broader range of scripts.

@CabalCrow
Copy link
Contributor

Back when I was attempting a rewrite, I was trying to put the newlines in scanner due to the same issue. There were plenty of problem with comments and all other types of syntax that is allowed by nushell.

Do note that the bash tree sitter also had some of the issue, so maybe this could be a limitation of what you can do with tree-sitter without relying on the scanner.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 13, 2024

@CabalCrow Yeah, external scanner was my initial thought and probably is the ultimate solution. Yet it turns out that most of known errors can still be solved without a scanner, comments are definitely on the list of attention here. I feel that keeping it all js for now helps to recognize the parts that really requires a scanner, for example the unquoted string rule already looks insanely complicated.

@mrdgo
Copy link
Contributor

mrdgo commented Nov 13, 2024

Thanks for the PR and the request for review.

I honestly feel some pain looking at all the places where you now explicitly allow newlines. On the other hand, it's a great advantage to solve it in js. I'll take a closer look tomorrow.

@@ -52,7 +52,28 @@ use $"('s' + 't' + 'd')"
(pipeline
(pipe_element
(expr_binary
(val_string)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested binary expression made left associative.

Copy link
Contributor

@mrdgo mrdgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the cleanup work alongside the newline control :)

grammar.js Show resolved Hide resolved
optional(
prec.right(
seq(
repeat($._newline),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this duplicate lines 172, 180, and 188?

(If those are the only references to $.parameter ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since $.parameter is repeated, I feel it more clear that it only allows edge newlines on one side (here on the beginning). Shared newline across parameters probably wont cause any conflict though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that tests wouldn't pass on conflict, I was just wondering. Consider it more a curious question than a request for change :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's a good point, I probably need to use a helper function for those kind of repeated patterns, somewhat like list_entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't really need a helper function, it doesn't really hurt to tell treesitter twice that there may be newlines between parameters.

@blindFS blindFS marked this pull request as ready for review November 15, 2024 07:50
@blindFS
Copy link
Contributor Author

blindFS commented Nov 15, 2024

I think it is ready for review.
I have tested it on all nu scripts in nushell/nu_scripts

Either it is parsed with no error (probably wrong answer though)
or the error is rooted somewhere else (collected a dozen of new issues :-( ).

I left some parenthesized rules unchanged by intention, technically newlines should be allowed in them, but I think nobody writes nushell in a crazy way like this:

(
let
foo
=
1
)

The CI action pipeline has permission issues at the time.

grammar.js Outdated Show resolved Hide resolved
@blindFS
Copy link
Contributor Author

blindFS commented Nov 15, 2024

Github action file modified to git rid of permission error in the last commit, not sure it actually overrides the npm as originally intended. @fdncred

@fdncred
Copy link
Collaborator

fdncred commented Nov 15, 2024

I'm fine with landing it and seeing how it goes but we could also wait for @mrdgo and @CabalCrow to see if they have time to review since this is such a fundamental change?

@mrdgo
Copy link
Contributor

mrdgo commented Nov 15, 2024

I am absolutely fine with this!

As soon as we have a scanner.c, someone could try to implement newline support in there and compare. But I don't really see any show-stopper, here.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 15, 2024

I'm fine with landing it and seeing how it goes but we could also wait for @mrdgo and @CabalCrow to see if they have time to review since this is such a fundamental change?

It's up to you. The test went well, I feel confident for this now. The main concern is whether the change will make other contributors feel uncomfortable.

@fdncred fdncred merged commit 59baf26 into nushell:main Nov 16, 2024
3 checks passed
@fdncred
Copy link
Collaborator

fdncred commented Nov 16, 2024

Thanks. Let's move forward and see how it goes. Appreciate all the work here.

@blindFS blindFS deleted the pr branch November 17, 2024 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to parse multi-line binary expression
4 participants