Merge pull request #36 from stsewd/basic-contributing-doc

Add basic contributing doc
stsewd · Jun 13, 2022 · 55e5841 · 55e5841
1 parent b74770c
commit 55e5841
Show file tree

Hide file tree

Showing 2 changed files with 119 additions and 0 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,115 @@
+# Contributing
+
+Thanks for the interest in contributing to this project!
+Next you'll find some general explanation about the project and how to run it locally.
+
+## Tree-sitter
+
+To get more familiar with tree-sitter itself and writing tree-sitter grammars,
+you may want to read <https://tree-sitter.github.io/tree-sitter/creating-parsers>.
+
+## The grammar
+
+Most tree-sitter grammars are written using a single `grammar.js`
+file with a declarative-like syntax.
+
+But reStructuredText isn't a programming language with a well defined specification,
+it has a lot of edge cases, and a text can have a different meaning depending on the context
+it is located or its indentation level.
+
+Tree-sitter is flexible enough that it lets us write some rules in `C` (external scanner),
+so for the reason above, our grammar will make heavy use of this feature.
+
+## External scanner
+
+Tree-sitter is a LR(k) parser, so we can't backtrack.
+Our external scanner must share some logic while recognizing some nodes.
+For example, if we find a `*` character,
+we first try to see if it's a list element,
+then an _emphasis_ node, then a _strong_ node, etc.
+
+Most of the time when something isn't a recognizable node,
+it is interpreted as a _simple text_.
+
+The external scanner also allow us to keep some state between each parsing of a node,
+this is currently used to keep track of the indentation levels.
+
+## Project structure
+
+Most of the files on the repository are auto-generated by tree-sitter,
+they are needed for the grammar to be compiled easily on the user's computer,
+so they are committed in the repository.
+
+Some of the files that aren't auto-generated are:
+
+- `grammar.js`: it defines all nodes that our grammar has and its structure.
+- `src/scanner.c`: the entry point to our custom scanner, to make it easier to maintain
+  the code that isn't auto-generated is inside the `src/tree_sitter_rst/` directory.
+- `src/tree_sitter_rst/scanner.c`: it contains functions used to create/serialize/de-serialize
+  our custom scanner, and it also has the main entry point to our custom scanner:
+  `rst_scanner_scan` (AKA, the big collection of `if`s).
+- `src/tree_sitter_rst/tokens.h`: defines all tokens that our external scanner recognize,
+  they are the same that are declared in the `externals` attribute in our `grammar.js` file.
+- `src/tree_sitter_rst/chars.c`: some utility functions to recognize characters, like numbers,
+  bullets, letters, etc.
+- `src/tree_sitter_rst/parser.c`: here are all functions that match the current text being parsed
+  to a valid `token`.
+- `test/corpus/`: tests for our grammar so we are sure nothing breaks when changing stuff,
+  you can read about the syntax at <https://tree-sitter.github.io/tree-sitter/creating-parsers#command-test>.
+- `test/examples/`: these are the files that docutils uses to run their tests,
+  we parse then without checking the resulting CST,
+  we only care if our parser errors in the process.
+- `docs/`: this directory is deployed to GitHub pages <https://stsewd.dev/tree-sitter-rst/>.
+
+## Developing
+
+Requirements:
+
+- Node
+- A C compiler (clang is preferred)
+- Docker (only if you want to see your changes on the browser)
+
+Install the requirements with:
+
+```bash
+npm install
+```
+
+To build the grammar:
+
+```bash
+npm run build
+```
+
+To run the tests:
+
+```bash
+npm run test
+```
+
+Note: if you changed the grammar, you need to re-build it
+for tests to use the new grammar.
+
+Test the grammar by parsing a file:
+
+```bash
+npm run parse -- test.rst
+```
+
+Test the grammar on your browser:
+
+```bash
+npm run web
+```
+
+Note: if you changed the grammar, you need to rebuild it and run
+`npm run wasm` (requires docker).
+
+Some times you may find useful to compare the output of docutils for a given RST document,
+since the reStructuredText specification doesn't contain/explain all edge cases.
+
+```
+pip install docutils
+rst2html5.py test.rst out.html
+xdg-open out.html
+```
diff --git a/README.md b/README.md
@@ -53,6 +53,10 @@ Check the playground at <https://stsewd.dev/tree-sitter-rst/>.
 - [nvim-treesitter](https://github.com/nvim-treesitter/nvim-treesitter)
 - Yours?
 
+## Contributing
+
+Check the [CONTRIBUTING.md](CONTRIBUTING.md) file
+
 ## Other grammars
 
 - [tree-sitter-comment](https://github.com/stsewd/tree-sitter-comment): grammar for comment tags like `TODO`, `FIXME(user)`.