Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thoughtpolice/tree-sitter-openddl

a tree-sitter grammar, for OpenDDL v2.0
https://github.com/thoughtpolice/tree-sitter-openddl

openddl parser tree-sitter

Last synced: 13 days ago
JSON representation

a tree-sitter grammar, for OpenDDL v2.0

Awesome Lists containing this project

README

        

# `tree-sitter` parser for OpenDDL

This repository contains a [tree-sitter][] grammar for the [Open Data
Description Language][oddl] ("OpenDDL", "ODDL"), designed and authored by Eric
Lengyel. It is a very close transcription of the official OpenDDL grammar,
described using railroad diagrams, on . It targets the
latest **OpenDDL 2.0** specification.

[tree-sitter]: https://tree-sitter.github.io
[oddl]: http://openddl.org

The intention of this project is to provide a canonical, machine-usable
description of the original grammar, one that can be used in other
OpenDDL-based tools -- such as derivative, format-specific parsers -- by simply
incorporating `tree-sitter`. A distant secondary goal is to start building a
canonical test suite of ODDL files that other implementations could share, to
ensure they can parse things correctly (so that we can avoid [creating our own
nightmares](http://seriot.ch/parsing_json.php)).

> **HEADS UP**: This grammar should be considered *very* unstable as of now,
> and not thoroughly tested or documented at this time. String literal parsing,
> at minimum, is certainly not within spec. There are few test cases,
> exercising only small, trivial parts of the grammar. `tree-sitter`'s
> highlighting support is still changing, and should be considered
> non-functional -- and more I've forgotten.

The current primary use case is a foundational parser for tools built around
the [Open Graphics Exchange Format][ogex] ("OpenGEX", "OGEX") format, but you
can generally reuse the grammar for *any* ODDL tool -- it is likely useful for
any other uses of the OpenDDL format, which I'm sure people can think up.

Thanks to the design of `tree-sitter` itself, it also provides a foundation for
incremental re-parsing and syntax highlighting of OpenDDL-based formats, which
could be used for efficient editor integration, refactoring, etc -- though this
is likely only useful for simpler, custom OpenDDL formats, versus formats like
OpenGEX (which are intended to be generated, and will often be very large).

> **NOTE**: While OpenDDL is the basis language for the OpenGEX, and one
> intention of this project is to be usable for OpenGEX-based tooling, the
> `tree-sitter` parser here **DOES NOT** offer any specific support or
> validation for the OpenGEX format, such as validating properties, types, etc.
> That must be built as a layer on top of the `tree-sitter` AST.

[ogex]: http://opengex.org

## Usage

Traditionally, developers of `tree-sitter` grammars are encouraged to write
grammars, and generate C code for their grammar using `tree-sitter generate`.
This auto-generated code is then committed next to the grammar code itself, in
the Git repository. Users of `tree-sitter` grammars are intended to clone that
repository as a submodule, and link against the C code checked into it.

While this design works *okay*, I generally find this kind of design to be
flawed in general, for a number of reasons (which won't be elaborated on here),
and so it is avoided to some extent.

Instead, generated C code is distributed separately from the grammar code
(though still in Git), and is automatically generated upon every commit using
continuous integration. You're encouraged to instead simply vendor the C code
into your repository by downloading a version of it when needed (or, using `git
submodule` directly -- if you hate yourself and anyone who has to contribute.)

### Downloading C code for the grammar

> **Version information**: The C code for this grammar is generated by
> `tree-sitter` version **0.16.2**, and therefore you **MUST** link the
> generated C code against a compatible version of the `tree-sitter` library --
> version 0.16.x or later.

TBD.

### Sample C program

TBD.

## Building & hacking

I use Nix to do both continuous integration and local development, so [install
Nix if you wish](https://nixos.org/nix), on your favorite Linux distribution.
(You can use any Linux distribution you like, in fact.) Then run `nix-build` a
lot, or `nix-shell` and hack iteratively.

Alternatively, you can install `tree-sitter` yourself and do typical
`tree-sitter generate && tree-sitter test` development, but Nix does all that
for you and a lot more (provisioning `nodejs`, etc). It's your choice.

> **NOTE**: The `nix`-based build here ONLY works on x86_64 Linux, but this is
> only a technical restriction, due to the usage of a static Linux binary for
> `tree-sitter`. This could be lifted in the future for macOS and aarch64
> Linux.
>
> In the mean time, **macOS users cannot use Nix, and must use tree-sitter
> directly**. They must also install `nodejs`.
>
> I strongly suggest that Windows users use a tool like WSL2 in order to do
> grammar development. Like any Linux distro, WSL2 Linux distributions can use
> `nix`, or tree-sitter and nodejs directly, as macOS users do.

A useful guide to keep open in your browser is `tree-sitter`'s [documentation
on how to create parsers][ts-parsing].

[ts-parsing]: http://tree-sitter.github.io/tree-sitter/creating-parsers

### Continuous deployment

TBD: get GH Actions deploying things, and describe it here.

# Authors

See
[AUTHORS.txt](https://raw.githubusercontent.com/thoughtpolice/tree-sitter-openddl/master/AUTHORS.txt)
for the list of contributors to the project.

# License

*MIT*, like most `tree-sitter` grammars. See
[LICENSE.txt](https://raw.githubusercontent.com/thoughtpolice/tree-sitter-openddl/master/LICENSE.txt)
for precise terms of copyright and redistribution.