Lexy

A lexical analyzer generator. Compiles regex patterns into C++ table-driven scanners.

What It Does

Lexy takes token specifications and generates standalone C++ scanners. The generated code uses transition tables to recognize tokens—same approach as Lex/Flex.

Pipeline:

Parse .lexy token specifications
Build regex ASTs
Convert ASTs to NFAs (Thompson's construction)
Merge multiple NFAs into one
Determinize to DFA (subset construction)
Minimize DFA (Hopcroft's algorithm)
Generate C++ code with transition tables

Table-Driven Design:

The generated scanners use a 2D array TRANSITION_TABLE[state][char] -> next_state plus an accepting states array. A simple loop walks the input, looks up transitions, and implements longest-match with backtracking.

Regex Support

Operators: |, *, +, ?, and concatenation
Ranges: {n,m}, {n,}
Character classes: [a-z], [^abc]
Escapes: \n, \t, \\, and metacharacters
Wildcard: .

Build & Run

make                                   # Build generator
./scanner_generator.exe input.lexy     # Generate scanner

Requires C++20.

Example

Input (examples/myScanner.lexy):

IDENTIFIER ::= "[a-zA-Z_][a-zA-Z0-9_]*"
INTEGER ::= "0|[1-9][0-9]*"

Generate:

./scanner_generator.exe examples/myScanner.lexy

Output: generated/scanners/myScanner.cpp

Test:

Scanner scanner("hello123");
Token t1 = scanner.getNextToken();  // IDENTIFIER: "hello"
Token t2 = scanner.getNextToken();  // INTEGER: "123"
Token t3 = scanner.getNextToken();  // EOF

Limitations

ASCII only (0-127)
No table compression
No token priority rules
No whitespace skipping

References

Aho, Sethi, Ullman - Compilers: Principles, Techniques, and Tools (Dragon Book)
Cooper & Torczon - Engineering a Compiler
Hopcroft, Motwani, Ullman - Introduction to Automata Theory

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexy

What It Does

Regex Support

Build & Run

Example

Limitations

References

About

Uh oh!

Languages

License

amine-kherroubi/lexy

Folders and files

Latest commit

History

Repository files navigation

Lexy

What It Does

Regex Support

Build & Run

Example

Limitations

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages