-
Notifications
You must be signed in to change notification settings - Fork 38
Guide on multi-mode lexing #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, nice to see some stuff about multi-mode! However, this seems more about template literals than just multi-mode lexing. Not necessarily a problem, but I would recommend we change the title and make it clear that we're talking about implementing template literals through multi-mode lexing, as that appears to be the primary topic here.
It might even make a better tutorial over a guide, as this is a targeted application; but that's less important than the first point.
|
||
Many modern programming languages such as [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) or [C#](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated) support template literals. | ||
They are a way to easily concatenate or interpolate string values while maintaining great code readability. | ||
This guide will show you how to support template literals in Langium. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guide will show you how to support template literals in Langium. | |
This guide will show you how to support template literals in Langium though multi-mode lexing. |
This paragraph is still a bit strange, as it reads more like the topic is template literals.
They are a way to easily concatenate or interpolate string values while maintaining great code readability. | ||
This guide will show you how to support template literals in Langium. | ||
|
||
For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`. | |
For this specific example, our template literal starts and ends with backticks `` ` ``, and is interrupted by expressions that are wrapped in curly braces `{}`. |
@@ -0,0 +1,175 @@ | |||
--- | |||
title: "Multi-Mode Lexing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend changing the title here to something about template literals, possibly
Template Literals with Multi-Mode Lexing
``` | ||
|
||
Conceptually, template strings work by reading a start terminal which starts with `` ` `` and ends with `{`, | ||
followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``. | |
followed by an expression and an end terminal, which is `}` and `` ` ``. |
} | ||
``` | ||
|
||
Of course, let's not forget to bind all of these services: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, let's not forget to bind all of these services: | |
Of course, let's not forget to bind all of these services in your **module.ts**: |
|
||
export class CustomTokenBuilder extends DefaultTokenBuilder { | ||
|
||
override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From before, I would first break this out into a separate paragraph, explaining we need to first build up a multi-mode lexer definition that has various modes, which are pushed on by our special tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
} | ||
} | ||
|
||
protected override buildKeywordToken( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make a nice second part, indicating we need cleanup our }
token so regular mode doesn't get messed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
return tokenType; | ||
} | ||
|
||
protected override buildTerminalToken(terminal: GrammarAST.TerminalRule): TokenType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third part, we can add this & explain how we're associating a push/pop action for start/end literals (which chevrotain needs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
bccf3a6
to
72a8fbb
Compare
72a8fbb
to
4eb80ef
Compare
|
4eb80ef
to
ef14db1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went back through and resolved a number of discussions to try and make the outstanding points clearer. Most of the remaining suggestions are specifically for grammar or clarity, but the rest should be good.
The following implementation of a `TokenBuilder` will do the job for us. It creates two lexing modes, which are almost identical except for the `TEMPLATE_LITERAL_MIDDLE` and `TEMPLATE_LITERAL_END` terminals. | ||
We will also need to make sure that the modes are switched based on the `TEMPLATE_LITERAL_START` and `TEMPLATE_LITERAL_END` terminals. We use `PUSH_MODE` and `POP_MODE` for this. | ||
|
||
```ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a step up from this, I still feel we should split this up. But in the interest of moving this along after some time can we instead make an issue for a custom token builder guide separately?
|
||
export class CustomTokenBuilder extends DefaultTokenBuilder { | ||
|
||
override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
} | ||
} | ||
|
||
protected override buildKeywordToken( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
return tokenType; | ||
} | ||
|
||
protected override buildTerminalToken(terminal: GrammarAST.TerminalRule): TokenType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
The generated AstNode for TemplateLiteral looks like this:
I would have expected to see |
@agacek Thanks for the info, I've created eclipse-langium/langium#1506 for this. |
Just a quick piece of feedback: I skimmed through the tutorial in the PR, and noticed there wasn't actually much on what, conceptually speaking, a 'lexer mode' is. It may be worth adding a quick sentence explaining that when first introducing the concept -- presumably it's just state that the lexer uses to figure out what to do? |
Really helpful guide, thanks. If the guide becomes primarily about template strings, it might be useful to briefly discuss how to get them working for languages with curly braces in an expression context (e.g. object literals in JS or dictionary literals in Python). Currently, code like protected override buildKeywordToken(
keyword: GrammarAST.Keyword,
terminalTokens: TokenType[],
caseInsensitive: boolean
): TokenType {
let tokenType = super.buildKeywordToken(keyword, terminalTokens, caseInsensitive);
if (tokenType.name === '{') {
// Enter regular mode (for object literals etc.)
tokenType.PUSH_MODE = REGULAR_MODE;
} else if (tokenType.name === '}') {
// Return to previous mode
tokenType.POP_MODE = true;
// The default } token will use [TEMPLATE_LITERAL_MIDDLE, TEMPLATE_LITERAL_END] as longer alts
// We need to delete the LONGER_ALT, they are not valid for the regular lexer mode
delete tokenType.LONGER_ALT;
}
return tokenType; |
Effectively closes #70