Third party libraries

The project uses several third party libraries. Some of them are needed by the language server to parse LSP messages and communicate through DAP. Also, we use a third party library to recognize the syntax of HLASM.

ASIO C++ library
ASIO is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. We use it to handle TCP communication in a cross-platform way. ASIO implements std::iostream wrappers around the TCP stream, which allows us to abstract from the source of the communication.
JSON for Modern C++
We use the JSON for Modern C++ library to parse and serialize JSON. It is used in both LSP and DAP. It allows us to seamlessly traverse JSON input, extract relevant values, and respond with valid JSON messages.

Usage of ANTLR4

Part of our analyzer is based on the ANTLR 4 parser generator. ANLTR 4 implements the Adaptive LL(*) parsing strategy.

Adaptive LL(*) Parsing Strategy

The Adaptive LL(*) (or ALL(*)) parsing strategy is a simple, efficient and predictable top-down LL(k) parsing strategy with the power of GLR, which can handle non-deterministic and ambiguous grammars. Authors move the grammar analysis to parse-time. This lets ALL(*) handle any non-left-recursive context-free grammar rules and for efficiency it caches analysis results in a lookahead DFA.

Theoretical time complexity can be viewed as a possible downside of ALL(*). The parsing of n symbols takes O(n⁴) in theory. In practice, however, ALL(*) seems to outperform other parsers by an order of magnitude.

Despite the theoretical O(n⁴) time complexity, it appears that ALL(*) behaves linearly on most of the code, with no unpredictable performance or large footprint in practice. In order to support this, authors investigate the parse time vs file size for files written in the C, Verilog, Erlang and Lua languages. They found very strong evidence of linearity on all tested languages (see the original paper for details).

ANTLR 4 Pipeline

ANTLR 4, similarly to any other conventional parser generator, processes the inputted code by breaking down the source string into tokens using lexer, and then builds parse trees using parser .

This pipeline in ANTLR 4 is broken into the following classes:

CharStream
Represents input code.
Lexer
Breaks the inputted code into tokens.
Token
Token representation that includes important information like token type, position in code and the actual text.
Parser
Builds parse trees.
TokenStream
Connects the lexer and parser.

The following picture sketches the described pipeline.

ANTLR Parser

The input to ANTLR is a grammar written in ANTLR-specific language that specifies the syntax of HLASM language (see the 193 grammar rules in the grammar visualization). The framework takes the grammar and generates source code (in C++) for a recognizer, which is able to tell whether the input source code is valid or not. Moreover, it is possible to assign a piece of code that executes every time a grammar rule is matched by the recognizer to further process the matched piece of code and produce helper structures (statements).

Parse-Tree Walking

ANTLR 4 offers two mechanisms for tree-walking: the parse-tree listeners and parse-tree visitors. The listener can only be used to get a notification for each matched grammar rule. The visitor lets the programmer control the walk by explicitly calling methods to visit children.

We employ the visitor approach when evaluating CA expressions, because we require ample control over the evaluation (such as operator priority).

The ANTLR 4 first generates hlasmparserVisitor and hlasmparserBaseVisitor. The former is an abstract class, the latter is a simple implementation of the former. Both classes define visit functions for every grammar rule. A visit function has exactly one argument — the context of the rule. The simple implementation executes visitChildren(). Our parse-tree visitor — the expression_evaluator — overrides hlasmparserBaseVisitor. In order to evaluate a sub-rule, we call visit(ctx->sub_rule()), where ctx->sub_rule() returns the context of the sub-rule. The visit() function matches the appropriate function of the visitor based on the context type (for example, visit(ctx->sub_rule()) would call visiSub_rule(..)).

Language server
4.1 LSP and DAP
4.2 Language server overview
4.3 IO handling
4.4 LSP and DAP server
4.5 Request manager
Workspace manager
5.1 Parser library API
5.2 Libraries configuration
5.3 Workspace manager overview
Analyzer
6.1. LSP data collector
6.2. Processing manager
6.2.1 Statement providers
6.2.2 Statement processors
6.2.3 Instruction processors
6.2.4 Expressions
6.3. Instruction format validation
6.4. Lexer
6.5. Parser
6.6. HLASM context tables
Macro tracer
Extension

III. Dependencies and Build Instructions

Architecture visualization
Grammar visualization

Third party libraries

Usage of ANTLR4

Adaptive LL(*) Parsing Strategy

ANTLR 4 Pipeline

ANTLR Parser

Parse-Tree Walking

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Contents

Clone this wiki locally