Fast lexer #684

Kronos3 · 2025-04-22T20:50:54Z

This PR reimplements the lexing strategy that produces a hugely faster lexer (8-10x speedup). It uses the same design pattern as can be found in the Scala compiler using a manually written recursive descent parser that moves through the file character-by-character. I also included some fixes in the parser that speed up the parsing by ~30% using IntelliJ's static analysis and auto-fixes. Mainly the fixes removed some redunant syntax and added private to functions that could be internalized (allows inlining and other optimizations by Scala).

Notable changes

There are some very minor discrepancies between the old and new lexer, @bocchino I didn't get exact feature parity because some behavior seems a little under defined so I'll wait for your input:

Trailing spaces after a \ line continuation are disallowed. There is a single file in the FPrime repo that has some trailing spaces which throws an error in the lexer. I'm leaning toward keeping the functionality as is since it enforces cleaner syntax.
The lexer uses an error Context to track invalid tokens or tokenization errors.
I had to remove the scala-3 migration sbt plugin because this actually forced Scala2 only features and I couldn't use enum
The string trimming of multi-line strings (3-strings) is pretty confusing (or bugged). The old behavior had odd behavior:

constant F: string = """
    indented
    keeps
    trailing newline
    """

Generates a string with value: indented\nkeeps\ntrailing newline\n

constant F: string = """
non-indented
does not keep
trailing newline
"""

Generates string with value: non-indented\ndoes not keep\ntrailing newline

For now the new lexer will not string trailing newlines ever but I can easily add that in.

Benchmarks

To benchmark I ran a JVM-based profiler to compare lexing times on the locs generation step of the FppTest build (this is a parsing intensive build stage). The images screenshots of the runtime of the lexer before and after this PR. I also timeed the runtime of the native GraalVM build of fpp-locate-defs before and after this PR. The stats are included below.

Before:

real	0m3.460s
user	0m3.389s
sys	0m0.071s

After:

real	0m0.335s
user	0m0.284s
sys	0m0.052s

Related to #629

bocchino · 2025-04-23T15:16:53Z

This is great! The performance gain is impressive, and it should help speed up the F Prime builds. Let's discussed the proposed changes to the spec.

…r multi-line string literals

Kronos3 · 2025-04-29T18:22:48Z

I have pushed the following changes

Clarified whitespace handling of multi-line string literals in the spec. The old behavior did not match the spec. We decided to update the spec to match the behavior.
Updated lexer to allow trailing spaces after a line continuation, any other character will trigger an error
Added a testcase for line continuation and multi-error reporting (updated illegal-character test in fpp-syntax)

Kronos3 · 2025-04-29T19:47:47Z

More interesting benchmarks from our CI jobs:

Before

After

Before

After

Remove binary numeric literals for now

bocchino

Looks great! I made a few minor changes. When CI passes, I will merge it.

Kronos3 · 2025-05-07T01:05:39Z

I neglected to put the most important benchmark here which is the cmake configuration time. I purged FppTest before each run and ran a native GraalVM build before/after. Here is the relevant cmake output from fprime-util generate --ut (run on M2 Macbook Pro):

Before:

-- Configuring done (23.9s)
-- Generating done (1.7s)

After:

-- Configuring done (10.4s)
-- Generating done (1.7s)

Kronos3 added 3 commits April 21, 2025 17:45

Rework the lexer

6c1ae4c

Fix up multiline string literals

b5486f9

Fix bugs in lexer to get tests to pass

d8dfc67

Kronos3 requested a review from bocchino April 22, 2025 21:57

Kronos3 added 3 commits April 29, 2025 11:01

Update lexer to handle line continuations properly and update spec fo…

64797ad

…r multi-line string literals

Merge remote-tracking branch 'origin/main' into fast-lexer

6401929

Regen docs

5d920cc

bocchino added 4 commits May 5, 2025 15:40

Revise lexer and analysis

889a567

Remove binary numeric literals for now

Refactor MultiError

ee852a8

Revise spec

c6f6a5f

Revise spec

0b32660

bocchino approved these changes May 5, 2025

View reviewed changes

bocchino merged commit 65154e3 into main May 6, 2025
11 checks passed

bocchino deleted the fast-lexer branch May 6, 2025 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast lexer #684

Fast lexer #684

Kronos3 commented Apr 22, 2025 •

edited

Loading

bocchino commented Apr 23, 2025

Kronos3 commented Apr 29, 2025

Kronos3 commented Apr 29, 2025

bocchino left a comment

Kronos3 commented May 7, 2025 •

edited

Loading

Fast lexer #684

Fast lexer #684

Conversation

Kronos3 commented Apr 22, 2025 • edited Loading

Notable changes

Benchmarks

bocchino commented Apr 23, 2025

Kronos3 commented Apr 29, 2025

Kronos3 commented Apr 29, 2025

bocchino left a comment

Choose a reason for hiding this comment

Kronos3 commented May 7, 2025 • edited Loading

Kronos3 commented Apr 22, 2025 •

edited

Loading

Kronos3 commented May 7, 2025 •

edited

Loading