-
-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #36 from stsewd/basic-contributing-doc
Add basic contributing doc
- Loading branch information
Showing
2 changed files
with
119 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# Contributing | ||
|
||
Thanks for the interest in contributing to this project! | ||
Next you'll find some general explanation about the project and how to run it locally. | ||
|
||
## Tree-sitter | ||
|
||
To get more familiar with tree-sitter itself and writing tree-sitter grammars, | ||
you may want to read <https://tree-sitter.github.io/tree-sitter/creating-parsers>. | ||
|
||
## The grammar | ||
|
||
Most tree-sitter grammars are written using a single `grammar.js` | ||
file with a declarative-like syntax. | ||
|
||
But reStructuredText isn't a programming language with a well defined specification, | ||
it has a lot of edge cases, and a text can have a different meaning depending on the context | ||
it is located or its indentation level. | ||
|
||
Tree-sitter is flexible enough that it lets us write some rules in `C` (external scanner), | ||
so for the reason above, our grammar will make heavy use of this feature. | ||
|
||
## External scanner | ||
|
||
Tree-sitter is a LR(k) parser, so we can't backtrack. | ||
Our external scanner must share some logic while recognizing some nodes. | ||
For example, if we find a `*` character, | ||
we first try to see if it's a list element, | ||
then an _emphasis_ node, then a _strong_ node, etc. | ||
|
||
Most of the time when something isn't a recognizable node, | ||
it is interpreted as a _simple text_. | ||
|
||
The external scanner also allow us to keep some state between each parsing of a node, | ||
this is currently used to keep track of the indentation levels. | ||
|
||
## Project structure | ||
|
||
Most of the files on the repository are auto-generated by tree-sitter, | ||
they are needed for the grammar to be compiled easily on the user's computer, | ||
so they are committed in the repository. | ||
|
||
Some of the files that aren't auto-generated are: | ||
|
||
- `grammar.js`: it defines all nodes that our grammar has and its structure. | ||
- `src/scanner.c`: the entry point to our custom scanner, to make it easier to maintain | ||
the code that isn't auto-generated is inside the `src/tree_sitter_rst/` directory. | ||
- `src/tree_sitter_rst/scanner.c`: it contains functions used to create/serialize/de-serialize | ||
our custom scanner, and it also has the main entry point to our custom scanner: | ||
`rst_scanner_scan` (AKA, the big collection of `if`s). | ||
- `src/tree_sitter_rst/tokens.h`: defines all tokens that our external scanner recognize, | ||
they are the same that are declared in the `externals` attribute in our `grammar.js` file. | ||
- `src/tree_sitter_rst/chars.c`: some utility functions to recognize characters, like numbers, | ||
bullets, letters, etc. | ||
- `src/tree_sitter_rst/parser.c`: here are all functions that match the current text being parsed | ||
to a valid `token`. | ||
- `test/corpus/`: tests for our grammar so we are sure nothing breaks when changing stuff, | ||
you can read about the syntax at <https://tree-sitter.github.io/tree-sitter/creating-parsers#command-test>. | ||
- `test/examples/`: these are the files that docutils uses to run their tests, | ||
we parse then without checking the resulting CST, | ||
we only care if our parser errors in the process. | ||
- `docs/`: this directory is deployed to GitHub pages <https://stsewd.dev/tree-sitter-rst/>. | ||
|
||
## Developing | ||
|
||
Requirements: | ||
|
||
- Node | ||
- A C compiler (clang is preferred) | ||
- Docker (only if you want to see your changes on the browser) | ||
|
||
Install the requirements with: | ||
|
||
```bash | ||
npm install | ||
``` | ||
|
||
To build the grammar: | ||
|
||
```bash | ||
npm run build | ||
``` | ||
|
||
To run the tests: | ||
|
||
```bash | ||
npm run test | ||
``` | ||
|
||
Note: if you changed the grammar, you need to re-build it | ||
for tests to use the new grammar. | ||
|
||
Test the grammar by parsing a file: | ||
|
||
```bash | ||
npm run parse -- test.rst | ||
``` | ||
|
||
Test the grammar on your browser: | ||
|
||
```bash | ||
npm run web | ||
``` | ||
|
||
Note: if you changed the grammar, you need to rebuild it and run | ||
`npm run wasm` (requires docker). | ||
|
||
Some times you may find useful to compare the output of docutils for a given RST document, | ||
since the reStructuredText specification doesn't contain/explain all edge cases. | ||
|
||
``` | ||
pip install docutils | ||
rst2html5.py test.rst out.html | ||
xdg-open out.html | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters