Guide on multi-mode lexing #132

msujew · 2023-02-13T00:17:31Z

Effectively closes #70

montymxb

Cool, nice to see some stuff about multi-mode! However, this seems more about template literals than just multi-mode lexing. Not necessarily a problem, but I would recommend we change the title and make it clear that we're talking about implementing template literals through multi-mode lexing, as that appears to be the primary topic here.

It might even make a better tutorial over a guide, as this is a targeted application; but that's less important than the first point.

montymxb · 2023-02-17T14:40:53Z

hugo/content/guides/multi-mode-lexing.md

+
+Many modern programming languages such as [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) or [C#](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated) support template literals.
+They are a way to easily concatenate or interpolate string values while maintaining great code readability.
+This guide will show you how to support template literals in Langium.


Suggested change

This guide will show you how to support template literals in Langium.

This guide will show you how to support template literals in Langium though multi-mode lexing.

This paragraph is still a bit strange, as it reads more like the topic is template literals.

montymxb · 2023-02-17T14:41:53Z

hugo/content/guides/multi-mode-lexing.md

+They are a way to easily concatenate or interpolate string values while maintaining great code readability.
+This guide will show you how to support template literals in Langium.
+
+For this specific example, our template literal starts and ends using backticks `` ` ``  and are interupted by expressions that are wrapped in curly braces `{}`.


Suggested change

For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`.

For this specific example, our template literal starts and ends with backticks `` ` ``, and is interrupted by expressions that are wrapped in curly braces `{}`.

hugo/content/guides/multi-mode-lexing.md

montymxb · 2023-02-17T14:43:28Z

hugo/content/guides/multi-mode-lexing.md

@@ -0,0 +1,175 @@
+---
+title: "Multi-Mode Lexing"


I would recommend changing the title here to something about template literals, possibly

Template Literals with Multi-Mode Lexing

montymxb · 2023-02-17T14:44:52Z

hugo/content/guides/multi-mode-lexing.md

+```
+
+Conceptually, template strings work by reading a start terminal which starts with `` ` `` and ends with `{`, 
+followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``.


Suggested change

followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``.

followed by an expression and an end terminal, which is `}` and `` ` ``.

montymxb · 2023-02-17T15:22:57Z

hugo/content/guides/multi-mode-lexing.md

+}
+```
+
+Of course, let's not forget to bind all of these services:


Suggested change

Of course, let's not forget to bind all of these services:

Of course, let's not forget to bind all of these services in your **module.ts**:

hugo/content/guides/multi-mode-lexing.md

montymxb · 2023-02-17T15:29:11Z

hugo/content/guides/multi-mode-lexing.md

+
+export class CustomTokenBuilder extends DefaultTokenBuilder {
+
+    override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary {


From before, I would first break this out into a separate paragraph, explaining we need to first build up a multi-mode lexer definition that has various modes, which are pushed on by our special tokens.

montymxb · 2023-02-17T15:31:25Z

hugo/content/guides/multi-mode-lexing.md

+        }
+    }
+
+    protected override buildKeywordToken(


This would make a nice second part, indicating we need cleanup our } token so regular mode doesn't get messed up.

montymxb · 2023-02-17T15:32:45Z

hugo/content/guides/multi-mode-lexing.md

+        return tokenType;
+    }
+
+    protected override buildTerminalToken(terminal: GrammarAST.TerminalRule): TokenType {


Third part, we can add this & explain how we're associating a push/pop action for start/end literals (which chevrotain needs).

github-actions · 2023-12-14T14:04:31Z

PR Preview Action v1.4.4
🚀 Deployed preview to https://eclipse-langium.github.io/langium-previews/pr-previews/pr-132/
on branch `previews` at 2023-12-14 14:04 UTC

montymxb

Went back through and resolved a number of discussions to try and make the outstanding points clearer. Most of the remaining suggestions are specifically for grammar or clarity, but the rest should be good.

hugo/content/guides/multi-mode-lexing.md

montymxb · 2024-02-08T15:07:12Z

hugo/content/guides/multi-mode-lexing.md

+The following implementation of a `TokenBuilder` will do the job for us. It creates two lexing modes, which are almost identical except for the `TEMPLATE_LITERAL_MIDDLE` and `TEMPLATE_LITERAL_END` terminals.
+We will also need to make sure that the modes are switched based on the `TEMPLATE_LITERAL_START` and `TEMPLATE_LITERAL_END` terminals. We use `PUSH_MODE` and `POP_MODE` for this.
+
+```ts


As a step up from this, I still feel we should split this up. But in the interest of moving this along after some time can we instead make an issue for a custom token builder guide separately?

montymxb · 2024-02-08T15:07:24Z

hugo/content/guides/multi-mode-lexing.md

+
+export class CustomTokenBuilder extends DefaultTokenBuilder {
+
+    override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary {


montymxb · 2024-02-08T15:07:30Z

hugo/content/guides/multi-mode-lexing.md

+        }
+    }
+
+    protected override buildKeywordToken(


montymxb · 2024-02-08T15:07:35Z

hugo/content/guides/multi-mode-lexing.md

+        return tokenType;
+    }
+
+    protected override buildTerminalToken(terminal: GrammarAST.TerminalRule): TokenType {


hugo/content/guides/multi-mode-lexing.md

Outdated

agacek · 2024-05-17T13:07:08Z

The generated AstNode for TemplateLiteral looks like this:

export interface TemplateLiteral extends AstNode {
    ...
    content: Array<Expr> | Array<string>;
}

I would have expected to see content: Array<Expr | string>. Is this a bug or am I missing something?

msujew · 2024-05-21T09:46:36Z

@agacek Thanks for the info, I've created eclipse-langium/langium#1506 for this.

ym-han · 2024-07-21T15:23:11Z

Just a quick piece of feedback: I skimmed through the tutorial in the PR, and noticed there wasn't actually much on what, conceptually speaking, a 'lexer mode' is. It may be worth adding a quick sentence explaining that when first introducing the concept -- presumably it's just state that the lexer uses to figure out what to do?

lars-reimann · 2025-01-17T19:34:53Z

Really helpful guide, thanks. If the guide becomes primarily about template strings, it might be useful to briefly discuss how to get them working for languages with curly braces in an expression context (e.g. object literals in JS or dictionary literals in Python). Currently, code like `{ {"hello": "world"} }` would not be parsed correctly. Switching back to the regular mode when encountering an opening curly brace should resolve this:

    protected override buildKeywordToken(
        keyword: GrammarAST.Keyword,
        terminalTokens: TokenType[],
        caseInsensitive: boolean
    ): TokenType {
        let tokenType = super.buildKeywordToken(keyword, terminalTokens, caseInsensitive);
        
        if (tokenType.name === '{') {
            // Enter regular mode (for object literals etc.)
            tokenType.PUSH_MODE = REGULAR_MODE;
        } else if (tokenType.name === '}') {
            // Return to previous mode
            tokenType.POP_MODE = true;

            // The default } token will use [TEMPLATE_LITERAL_MIDDLE, TEMPLATE_LITERAL_END] as longer alts
            // We need to delete the LONGER_ALT, they are not valid for the regular lexer mode
            delete tokenType.LONGER_ALT;
        }
        return tokenType;

msujew added the recipe Improvements or additions to recipes label Feb 13, 2023

msujew requested a review from montymxb February 13, 2023 18:15

montymxb previously requested changes Feb 17, 2023

View reviewed changes

msujew mentioned this pull request May 22, 2023

A multi-mode lexer example eclipse-langium/langium#423

Closed

msujew force-pushed the msujew/multi-mode-lexing branch from bccf3a6 to 72a8fbb Compare December 14, 2023 12:45

msujew had a problem deploying to pull-request-preview December 14, 2023 12:46 — with GitHub Actions Failure

msujew force-pushed the msujew/multi-mode-lexing branch from 72a8fbb to 4eb80ef Compare December 14, 2023 14:02

msujew temporarily deployed to pull-request-preview December 14, 2023 14:04 — with GitHub Actions Inactive

msujew added 3 commits December 19, 2023 15:54

Add guide for multi mode lexing

8a80a66

Finish guide

746f1ed

Improve syntax

ef14db1

msujew force-pushed the msujew/multi-mode-lexing branch from 4eb80ef to ef14db1 Compare December 19, 2023 14:54

msujew deployed to pull-request-preview December 19, 2023 14:56 — with GitHub Actions View deployment

spoenemann requested a review from montymxb December 20, 2023 08:23

montymxb reviewed Feb 8, 2024

View reviewed changes

msujew mentioned this pull request May 21, 2024

Merge inferred array types eclipse-langium/langium#1506

Merged

aabounegm mentioned this pull request Jul 14, 2024

Add indentation-aware TokenBuilder and Lexer eclipse-langium/langium#1578

Merged

msujew mentioned this pull request Mar 8, 2025

Whitespace terminal rules are always matched first eclipse-langium/langium#1828

Open

	This guide will show you how to support template literals in Langium.
	This guide will show you how to support template literals in Langium though multi-mode lexing.

	For this specific example, our template literal starts and ends using backticks `` ` `` and are interupted by expressions that are wrapped in curly braces `{}`.
	For this specific example, our template literal starts and ends with backticks `` ` ``, and is interrupted by expressions that are wrapped in curly braces `{}`.

	followed by an expression and then an end terminal which is effectively just the start terminal in reverse using `}` and `` ` ``.
	followed by an expression and an end terminal, which is `}` and `` ` ``.

	Of course, let's not forget to bind all of these services:
	Of course, let's not forget to bind all of these services in your module.ts:


		export class CustomTokenBuilder extends DefaultTokenBuilder {

		override buildTokens(grammar: GrammarAST.Grammar, options?: { caseInsensitive?: boolean }): TokenVocabulary {

Guide on multi-mode lexing #132

Are you sure you want to change the base?

Guide on multi-mode lexing #132

Uh oh!

Conversation

msujew commented Feb 13, 2023

Uh oh!

montymxb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

montymxb Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

montymxb Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 14, 2023

Uh oh!

montymxb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agacek commented May 17, 2024

Uh oh!

msujew commented May 21, 2024

Uh oh!

ym-han commented Jul 21, 2024

Uh oh!

lars-reimann commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

montymxb Feb 17, 2023 •

edited

Loading

montymxb Feb 17, 2023 •

edited

Loading

montymxb left a comment •

edited

Loading

lars-reimann commented Jan 17, 2025 •

edited

Loading