Skip to content

Fast lexer #684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 6, 2025
Merged

Fast lexer #684

merged 10 commits into from
May 6, 2025

Conversation

Kronos3
Copy link
Collaborator

@Kronos3 Kronos3 commented Apr 22, 2025

This PR reimplements the lexing strategy that produces a hugely faster lexer (8-10x speedup). It uses the same design pattern as can be found in the Scala compiler using a manually written recursive descent parser that moves through the file character-by-character. I also included some fixes in the parser that speed up the parsing by ~30% using IntelliJ's static analysis and auto-fixes. Mainly the fixes removed some redunant syntax and added private to functions that could be internalized (allows inlining and other optimizations by Scala).

Notable changes

There are some very minor discrepancies between the old and new lexer, @bocchino I didn't get exact feature parity because some behavior seems a little under defined so I'll wait for your input:

  1. Trailing spaces after a \ line continuation are disallowed. There is a single file in the FPrime repo that has some trailing spaces which throws an error in the lexer. I'm leaning toward keeping the functionality as is since it enforces cleaner syntax.
  2. The lexer uses an error Context to track invalid tokens or tokenization errors.
  3. I had to remove the scala-3 migration sbt plugin because this actually forced Scala2 only features and I couldn't use enum
  4. The string trimming of multi-line strings (3-strings) is pretty confusing (or bugged). The old behavior had odd behavior:
constant F: string = """
    indented
    keeps
    trailing newline
    """

Generates a string with value: indented\nkeeps\ntrailing newline\n

constant F: string = """
non-indented
does not keep
trailing newline
"""

Generates string with value: non-indented\ndoes not keep\ntrailing newline

For now the new lexer will not string trailing newlines ever but I can easily add that in.

Benchmarks

To benchmark I ran a JVM-based profiler to compare lexing times on the locs generation step of the FppTest build (this is a parsing intensive build stage). The images screenshots of the runtime of the lexer before and after this PR. I also timeed the runtime of the native GraalVM build of fpp-locate-defs before and after this PR. The stats are included below.

Before:
image

real	0m3.460s
user	0m3.389s
sys	0m0.071s

After:
image

real	0m0.335s
user	0m0.284s
sys	0m0.052s

Related to #629

@Kronos3 Kronos3 requested a review from bocchino April 22, 2025 21:57
@bocchino
Copy link
Collaborator

This is great! The performance gain is impressive, and it should help speed up the F Prime builds. Let's discussed the proposed changes to the spec.

@Kronos3
Copy link
Collaborator Author

Kronos3 commented Apr 29, 2025

I have pushed the following changes

  • Clarified whitespace handling of multi-line string literals in the spec. The old behavior did not match the spec. We decided to update the spec to match the behavior.
  • Updated lexer to allow trailing spaces after a line continuation, any other character will trigger an error
  • Added a testcase for line continuation and multi-error reporting (updated illegal-character test in fpp-syntax)

@Kronos3
Copy link
Collaborator Author

Kronos3 commented Apr 29, 2025

More interesting benchmarks from our CI jobs:

Before
Screenshot 2025-04-29 at 11 34 00 AM

After
Screenshot 2025-04-29 at 11 33 52 AM

Before
Screenshot 2025-04-29 at 12 47 04 PM

After
image

Copy link
Collaborator

@bocchino bocchino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I made a few minor changes. When CI passes, I will merge it.

@bocchino bocchino merged commit 65154e3 into main May 6, 2025
11 checks passed
@bocchino bocchino deleted the fast-lexer branch May 6, 2025 01:00
@Kronos3
Copy link
Collaborator Author

Kronos3 commented May 7, 2025

I neglected to put the most important benchmark here which is the cmake configuration time. I purged FppTest before each run and ran a native GraalVM build before/after. Here is the relevant cmake output from fprime-util generate --ut (run on M2 Macbook Pro):

Before:

-- Configuring done (23.9s)
-- Generating done (1.7s)

After:

-- Configuring done (10.4s)
-- Generating done (1.7s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants