https://github.com/semgrep/project-reason-tree-sitter
project for program analysis candidates
https://github.com/semgrep/project-reason-tree-sitter
Last synced: 3 months ago
JSON representation
project for program analysis candidates
- Host: GitHub
- URL: https://github.com/semgrep/project-reason-tree-sitter
- Owner: semgrep
- Created: 2020-02-26T23:02:05.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-03-16T08:12:59.000Z (over 3 years ago)
- Last Synced: 2025-01-18T08:28:14.614Z (5 months ago)
- Language: Reason
- Size: 17.6 KB
- Stars: 2
- Watchers: 6
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# project-reason-tree-sitter
The goal of this project is to write code to convert the definition
of a programming language *grammar* (e.g., `tests/arithmetic/grammar.json`) into
the definition of an *Abstract Syntax tree* (AST) for this grammar
(e.g., `tests/arithmetic/ast_arithmetic.re` for the expected output on the
previous grammar).
This AST definition can be automatically derived from the grammar.The grammar definition we are using (e.g., `tests/arithmetic/grammar.json`)
does not use a traditional format (e.g., a BNF grammar using the Yacc syntax).
Instead, it uses a particuliar format (see the Context section below).
This `grammar.json` file is derived itself from another file
(e.g., `tests/arithmetic/grammar.js` for the arithmetic grammar) which is
easier to read (but harder to analyze).
See http://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl
if you need to understand the format of `grammar.js`
(which itself will help to understand the format of `grammar.json`).You can use any programming language to solve this problem, but we provide
skeleton code only for ReasonML.## Main problem
To test your code, run:
```bash
./_build/default/bin/main_codegen.exe -codegen_types tests/arithmetic/grammar.json > tests/arithmetic/ast_arithmetic_output.re
```
and compare your result in `tests/arithmetic/ast_arithmetic_output.re` with
the expected output in `tests/arithmetic/ast_arithmetic.re`.
Your result does not have to match exactly `tests/arithmetic/ast_arithmetic.re`,
but it should compile, and it should be mostly equivalent to
`tests/arithmetic/ast_arithmetic.re`.Note that your code should be general enough that it can be applied to
other grammar files, not just `tests/arithmetic/grammar.json`.The code you have to implement is mostly in `lib/codegen_types.re`.
## Hint
Because the format of the grammar in `grammar.json` allows nested
alternatives, it can be difficult to generate directly from the
`grammar.json` the ReasonML type definitions. You could find useful to
define an intermediate `ast_normalized_grammar.re` that would be closer
to what ReasonML can accept for type definitions.## Bonus
As a bonus, you can generate the type definitions in a certain order,
from the toplevel types (e.g., `program`) to its leaves (e.g., `number`)
as done in `tests/arithmetic/ast_arithmetic.re`. You will need
to perform a topological sort of the type dependencies to do so.## Context
The `tests/arithmetic/grammar.json` file is part of the tree-sitter project
https://github.com/tree-sitter/tree-sitter. tree-sitter is a parser
generator (similar to Yacc), which takes as input a `grammar.js` file
(e.g., `tests/arithmetic/grammar.js`). From this `grammar.js` file it
can generate a JSON file defining the same grammar but that is easier
to analyze (e.g., `tests/arithmetic/grammar.json`) and from this file
it can generate a parser for this grammar in C.Note that you do not need to understand or use tree-sitter for this project.
We just use the same format for the grammar definition (`grammar.json`).## What we are looking for:
* Comfort with the language of choice: e.g. JSON parsing and matching should be easily understood
* Grasp of grammar, abstract datatypes, polymorphic types
* Test driven development
* Communication of complex graph algorithms like recursive top-down walk
* Good solution involves use of intermediate normalized AST definition## Installation from source
To compile the code, you first need to [install OCaml](https://opam.ocaml.org/doc/Install.html) and its package manager OPAM.
On macOS, it should simply consist in doing:```bash
brew install opam
opam init
opam switch create 4.07.1
opam switch 4.07.1
eval $(opam env)
```Once OPAM is installed, you need to install
the OCaml frontend reason, and the build system dune, as well as
the pfff library (just for its commons/ sub-library):```bash
opam install reason
opam install dune
opam install pfff
```Then you can compile the program with:
```bash
dune build
```## Run
Then to test on a file, for example `tests/arithmetic/grammar.json`
run:```bash
./_build/default/bin/main_codegen.exe -parse_grammar tests/arithmetic/grammar.json
...
```## Development Environment
You can use Visual Studio Code (vscode) to edit the code.
The [reason-vscode](https://marketplace.visualstudio.com/items?itemName=jaredly.reason-vscode) Marketplace extension adds support for OCaml/Reason.The OCaml and Reason IDE extension by David Morrison is another valid
extension, but it seems not as actively maintained as reason-vscode.The source contains also a .vscode/ directory at its root
containing a task file to automatically build the code from vscode.Note that dune and ocamlmerlin must be in your PATH for vscode to correctly
build and provide cross-reference on the code. In case of problems, do:```bash
cd /path/to/here
eval $(opam env)
dune --version # just checking dune is in your PATH
ocamlmerlin -version # just checking ocamlmerlin is in your PATH
code .
```## Debugging code
Set the `OCAMLRUNPARAM` environment variable to `b` for backtrace.
You will get better backtrace information when an exception is thrown.```bash
export OCAMLRUNPARAM=b
```