Skip to content

Commit

Permalink
Merge pull request #36 from stsewd/basic-contributing-doc
Browse files Browse the repository at this point in the history
Add basic contributing doc
  • Loading branch information
stsewd authored Jun 13, 2022
1 parent b74770c commit 55e5841
Show file tree
Hide file tree
Showing 2 changed files with 119 additions and 0 deletions.
115 changes: 115 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Contributing

Thanks for the interest in contributing to this project!
Next you'll find some general explanation about the project and how to run it locally.

## Tree-sitter

To get more familiar with tree-sitter itself and writing tree-sitter grammars,
you may want to read <https://tree-sitter.github.io/tree-sitter/creating-parsers>.

## The grammar

Most tree-sitter grammars are written using a single `grammar.js`
file with a declarative-like syntax.

But reStructuredText isn't a programming language with a well defined specification,
it has a lot of edge cases, and a text can have a different meaning depending on the context
it is located or its indentation level.

Tree-sitter is flexible enough that it lets us write some rules in `C` (external scanner),
so for the reason above, our grammar will make heavy use of this feature.

## External scanner

Tree-sitter is a LR(k) parser, so we can't backtrack.
Our external scanner must share some logic while recognizing some nodes.
For example, if we find a `*` character,
we first try to see if it's a list element,
then an _emphasis_ node, then a _strong_ node, etc.

Most of the time when something isn't a recognizable node,
it is interpreted as a _simple text_.

The external scanner also allow us to keep some state between each parsing of a node,
this is currently used to keep track of the indentation levels.

## Project structure

Most of the files on the repository are auto-generated by tree-sitter,
they are needed for the grammar to be compiled easily on the user's computer,
so they are committed in the repository.

Some of the files that aren't auto-generated are:

- `grammar.js`: it defines all nodes that our grammar has and its structure.
- `src/scanner.c`: the entry point to our custom scanner, to make it easier to maintain
the code that isn't auto-generated is inside the `src/tree_sitter_rst/` directory.
- `src/tree_sitter_rst/scanner.c`: it contains functions used to create/serialize/de-serialize
our custom scanner, and it also has the main entry point to our custom scanner:
`rst_scanner_scan` (AKA, the big collection of `if`s).
- `src/tree_sitter_rst/tokens.h`: defines all tokens that our external scanner recognize,
they are the same that are declared in the `externals` attribute in our `grammar.js` file.
- `src/tree_sitter_rst/chars.c`: some utility functions to recognize characters, like numbers,
bullets, letters, etc.
- `src/tree_sitter_rst/parser.c`: here are all functions that match the current text being parsed
to a valid `token`.
- `test/corpus/`: tests for our grammar so we are sure nothing breaks when changing stuff,
you can read about the syntax at <https://tree-sitter.github.io/tree-sitter/creating-parsers#command-test>.
- `test/examples/`: these are the files that docutils uses to run their tests,
we parse then without checking the resulting CST,
we only care if our parser errors in the process.
- `docs/`: this directory is deployed to GitHub pages <https://stsewd.dev/tree-sitter-rst/>.

## Developing

Requirements:

- Node
- A C compiler (clang is preferred)
- Docker (only if you want to see your changes on the browser)

Install the requirements with:

```bash
npm install
```

To build the grammar:

```bash
npm run build
```

To run the tests:

```bash
npm run test
```

Note: if you changed the grammar, you need to re-build it
for tests to use the new grammar.

Test the grammar by parsing a file:

```bash
npm run parse -- test.rst
```

Test the grammar on your browser:

```bash
npm run web
```

Note: if you changed the grammar, you need to rebuild it and run
`npm run wasm` (requires docker).

Some times you may find useful to compare the output of docutils for a given RST document,
since the reStructuredText specification doesn't contain/explain all edge cases.

```
pip install docutils
rst2html5.py test.rst out.html
xdg-open out.html
```
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ Check the playground at <https://stsewd.dev/tree-sitter-rst/>.
- [nvim-treesitter](https://github.com/nvim-treesitter/nvim-treesitter)
- Yours?

## Contributing

Check the [CONTRIBUTING.md](CONTRIBUTING.md) file

## Other grammars

- [tree-sitter-comment](https://github.com/stsewd/tree-sitter-comment): grammar for comment tags like `TODO`, `FIXME(user)`.

0 comments on commit 55e5841

Please sign in to comment.