-
Notifications
You must be signed in to change notification settings - Fork 39
Fast lexer #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast lexer #684
Conversation
This is great! The performance gain is impressive, and it should help speed up the F Prime builds. Let's discussed the proposed changes to the spec. |
I have pushed the following changes
|
Remove binary numeric literals for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I made a few minor changes. When CI passes, I will merge it.
I neglected to put the most important benchmark here which is the cmake configuration time. I purged FppTest before each run and ran a native GraalVM build before/after. Here is the relevant cmake output from Before:
After:
|
This PR reimplements the lexing strategy that produces a hugely faster lexer (8-10x speedup). It uses the same design pattern as can be found in the Scala compiler using a manually written recursive descent parser that moves through the file character-by-character. I also included some fixes in the parser that speed up the parsing by ~30% using IntelliJ's static analysis and auto-fixes. Mainly the fixes removed some redunant syntax and added
private
to functions that could be internalized (allows inlining and other optimizations by Scala).Notable changes
There are some very minor discrepancies between the old and new lexer, @bocchino I didn't get exact feature parity because some behavior seems a little under defined so I'll wait for your input:
\
line continuation are disallowed. There is a single file in the FPrime repo that has some trailing spaces which throws an error in the lexer. I'm leaning toward keeping the functionality as is since it enforces cleaner syntax.Context
to track invalid tokens or tokenization errors.enum
Generates a string with value:
indented\nkeeps\ntrailing newline\n
Generates string with value:
non-indented\ndoes not keep\ntrailing newline
For now the new lexer will not string trailing newlines ever but I can easily add that in.
Benchmarks
To benchmark I ran a JVM-based profiler to compare lexing times on the locs generation step of the FppTest build (this is a parsing intensive build stage). The images screenshots of the runtime of the lexer before and after this PR. I also
time
ed the runtime of the native GraalVM build offpp-locate-defs
before and after this PR. The stats are included below.Before:

After:

Related to #629