Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/camertron/antlr4-native-rb

Create native Ruby extensions from (almost) any ANTLR4 grammar.
https://github.com/camertron/antlr4-native-rb

Last synced: 2 months ago
JSON representation

Create native Ruby extensions from (almost) any ANTLR4 grammar.

Awesome Lists containing this project

README

        

# antlr4-native

Create a Ruby native extension from (almost) any ANTLR4 grammar.

## What is this thing?

This gem generates native Ruby extensions from ANTLR grammars, enabling Ruby developers to generate parsers for numerous programming languages, file formats, etc.

## Who needs this?

If you're a Ruby programmer who wants to parse and traverse source code written in a plethora of programming languages, antlr4-native might be able to help you. A number of community-developed ANTLR grammars are available in ANTLR's [grammars-v4](https://github.com/antlr/grammars-v4) repo. Grab one, then use antlr4-native to generate a bunch of Ruby-compatible C++ code from it. The C++ code can be compiled and used as a native extension.

Rather than use antlr4-native directly, consider using its sister project, the [antlr-gemerator](https://github.com/camertron/antlr-gemerator), which can generate a complete rubygem from an ANTLR grammar.

## Code Generation

Here's how to generate a native extension for a given lexer and parser (Python in this case), defined in two .g4 files:

```ruby
require 'antlr4-native'

generator = Antlr4Native::Generator.new(
grammar_files: ['Python3Lexer.g4', 'Python3Parser.g4'],
output_dir: 'ext',
parser_root_method: 'file_input'
)

generator.generate
```

In the example above, the output directory is set to the standard Ruby native extensions directory, 'ext'. Antlr4-native will generate code into ext/\, where \ is the name of the parser as defined in the grammar file(s). In this case, PythonParser.g4 contains:

```antlr
parser grammar Python3Parser;
```

so antlr4-native will generate code into the ext/python3-parser directory.

Finally, the `parser_root_method` option tells antlr4-native which context represents the root of the parse tree. This context functions as the starting point for visitors.

## Using extensions in Ruby

Parsers contain several methods for parsing source code. Use `#parse` to parse a string and `#parse_file` to parse the contents of a file:

```ruby
parser = Python3Parser::Parser.parse(File.read('path/to/file.py'))

# equivalent to:
parser = Python3Parser::Parser.parse_file('path/to/file.py')
```

Use the `#visit` method on an instance of `Parser` to make use of a visitor:

```ruby
visitor = MyVisitor.new
parser.visit(visitor)
```

See the next section for more info regarding creating and using visitors.

## Visitors

A visitor class is automatically created during code generation. Visitors are just classes with a bunch of special methods, each corresponding to a specific part of the source language's syntax. The methods are essentially callbacks that are triggered in-order as the parser walks over the parse tree. For example, here's a visitor with a method that will be called whenever the parser walks over a Python function definition:

```ruby
class FuncDefVisitor < Python3Parser::Visitor
def visit_func_def(ctx)
puts ctx.NAME.text # print the name of the method
visit_children(ctx)
end
end
```

Make sure to always call `#visit_children` at some point in your `visit_*` methods. If you don't, the subtree under the current context won't get visited.

Finally, if you override `#initialize` in your visitor subclasses, don't forget to call `super`. If you don't, you'll get a nice big segfault.

## Caveats

1. Avoid retaining references to contexts, tokens, etc anywhere in your Ruby code. Contexts (i.e. the `ctx` variables in the examples above) and other objects that are created by ANTLR's C++ runtime are automatically cleaned up without the Ruby interpreter's knowledge. You'll almost surely see a segfault if you retain a reference to one of these objects and try to use it after the call to `Parser#visit`.
1. Due to an ANTLR limitation, parsers cannot be used in a multi-threaded environment, even if each parser instance is used entirely in the context of a single thread (i.e. parsers are not shared between threads). According to the ANTLR C++ developers, parsers should be threadsafe. Unfortunately firsthand experience has proven otherwise. Your mileage may vary.
1. The description of this gem says "(almost) any ANTLR4 grammar" because many grammars contain target-specific code. For example, the Python3 grammar referenced in the examples above contains inline Java code that the C++ compiler won't understand. You'll need to port any such code to C++ before you'll be able to compile and use the native extension.

## System Requirements

* A Java runtime (version 1.6 or higher) is required to generate parsers, since ANTLR is a Java tool. The ANTLR .jar file is distributed inside the antlr4-native gem, so there's no need to download it separately. You can download a Java runtime [here](https://www.java.com/en/download/).
* Ruby >= 2.3.
* A C compiler (like gcc or clang) that supports C++14. If Ruby is working on your machine then you likely already have this.

## License

Licensed under the MIT license. See LICENSE.txt for details.

## Authors

* Cameron C. Dutro: http://github.com/camertron