Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jzimmerman/langcc
langcc: A Next-Generation Compiler Compiler
https://github.com/jzimmerman/langcc
Last synced: 7 days ago
JSON representation
langcc: A Next-Generation Compiler Compiler
- Host: GitHub
- URL: https://github.com/jzimmerman/langcc
- Owner: jzimmerman
- License: apache-2.0
- Created: 2022-09-20T01:49:33.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-21T19:13:42.000Z (almost 2 years ago)
- Last Synced: 2024-08-02T14:05:03.780Z (3 months ago)
- Language: C++
- Homepage: https://langcc.io
- Size: 1.48 MB
- Stars: 1,724
- Watchers: 24
- Forks: 60
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ``langcc``: A Next-Generation Compiler Compiler
``langcc`` can serve as a replacement for
[lex](https://en.wikipedia.org/wiki/Lex_(software))+[yacc](https://en.wikipedia.org/wiki/Yacc), but is much more powerful. ``langcc`` enables a completely new paradigm for developing programming languages based on iteratively improving the intuitive declarative specification of your language instead of laboriously updating your hand-written compiler frontend for every grammar change.- ``langcc`` generates efficient, linear-time parsers for an extremely broad class of grammars (a strict superset of
canonical LR). In particular, the project includes complete specifications of Python 3.9.12
([grammars/py.lang](https://github.com/jzimmerman/langcc/blob/main/grammars/py.lang))
and Golang 1.17.8 ([grammars/go.lang](https://github.com/jzimmerman/langcc/blob/main/grammars/go.lang)),
and generates parsers for each language that are faster than their respective standard parsers (resp.,
1.2x and 4.3x faster).
- ``langcc`` requires only a single input file in its declarative ``.lang`` format to generate a full
compiler frontend, including AST struct definitions and associated traversals, hashing, pretty-printing,
and automatic integration with the generated parser (no need to sprinkle C++ code into your grammar as in lex+yacc).
- ``langcc`` provides a built-in "conflict tracing" algorithm which traces LR conflicts back to
"confusing input pairs", which provide explicit example strings instead of opaque shift/reduce conflicts.
- ``langcc`` extends the existing LR paradigm to include both recursive-descent parsing actions (which result in
much smaller and more intuitive automata) and also per-symbol attribute constraints (which provide support for
many industrial language constructs).
- ``langcc`` implements a novel transformation for LR grammars we call "continuation-passing style", which
significantly expands the class of grammars the tool supports.
- Finally, ``langcc`` ships with a standalone _datatype compiler_ called ``datacc``, which
generates full C++ implementations of algebraic datatypes (including sum types) from a simple declarative
language whose spec is provided in
[data.lang](https://github.com/jzimmerman/langcc/blob/main/grammars/data.lang).``langcc`` is _self-hosting_. The file
[grammars/meta.lang](https://github.com/jzimmerman/langcc/blob/main/grammars/meta.lang) contains the
specification for the "language of languages" in which ``langcc`` expects ``.lang`` files to be written.
In this canonical implementation, the file
[bootstrap.sh](https://github.com/jzimmerman/langcc/blob/main/bootstrap.sh) uses ``langcc`` to
generate its own compiler front-end.``langcc`` is stable and feature-complete. It is ready to be used as a research prototype to facilitate
rapid exploration of new compilers and programming languages. It is free and open-source and available under
the Apache 2.0 License.``langcc`` is also described in the following companion technical reports, which provide the theoretical basis for
its functionality.
- Zimmerman, Joe.
[Practical LR Parser Generation.](https://arxiv.org/pdf/2209.08383.pdf)
arXiv, 2022.
- Zimmerman, Joe.
[langcc: A Next-Generation Compiler Compiler.](https://arxiv.org/pdf/2209.08385.pdf)
arXiv, 2022.## Build
This build has been verified to work on Ubuntu 22.04 and MacOS 12.5.![Ubuntu CI](https://github.com/jzimmerman/langcc/actions/workflows/ci-ubuntu.yml/badge.svg?event=push) ![macOS CI](https://github.com/jzimmerman/langcc/actions/workflows/ci-macos.yml/badge.svg?event=push)
For Ubuntu 22.04:
```
./deps_ubuntu.sh
make -j8
sudo make install
```
For macOS 12.5 (requires Homebrew):
```
./deps_macos.sh
make -j8
sudo make install
```And, in order to bootstrap the ``langcc`` front-end itself, subsequently run:
```
./bootstrap.sh
```## Examples
Once ``langcc`` (and its companion ``datacc``) have been built, you can run the various provided examples:
- The most fully featured example is in ``examples/lox``, which includes a complete interpreter for a
full scripting language called Lox designed by Robert Nystrom at
[Crafting Interpreters](https://craftinginterpreters.com/). The Lox example is explored in detail in the
[user manual](https://github.com/jzimmerman/langcc/blob/main/MANUAL.md) and is the recommended way to learn
about ``langcc``'s feature set.
- ``examples/basic`` illustrates the basic syntax of a declarative language spec, while
``examples/calc`` implements a calculator for arithmetic expressions, including user-friendly error reporting.
- ``grammars/go.lang`` and ``grammars/py.lang`` provide complete specifications for the syntax of Golang 1.17.8 and
Python 3.9.12, respectively. They showcase the full feature set of ``langcc`` and are compiled during the main
build process itself, which produces tests ``build/go_standalone_test`` and ``build/py_standalone_test``. (Note: These binaries require repositories for Golang 1.17.8 and Python 3.9.12 located in the directories ``../go`` and ``../cpython``.)
- The language of datatypes, ``grammars/data.lang``, describes the input format expected by
the standalone datatype compiler ``datacc`` for specifying algebraic datatypes.
- Finally, the language of languages itself, ``grammars/meta.lang``, describes the input format
expected by ``langcc``. This language also serves as basic documentation, as it enumerates all of
its own features.## Documentation
For full documentation, please see the [user manual](https://github.com/jzimmerman/langcc/blob/main/MANUAL.md).