Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/machow/hoof

A python library to create abstract syntax trees with antlr4
https://github.com/machow/hoof

Last synced: 16 days ago
JSON representation

A python library to create abstract syntax trees with antlr4

Awesome Lists containing this project

README

        

# Hoof

[![Build Status](https://travis-ci.org/machow/hoof.svg?branch=master)](https://travis-ci.org/machow/hoof)

hoof is a python library for creating abstract syntax trees (ASTs) from [antlr](https://www.antlr.org/) parsers.

Whether you are dipping your toes in the world of parsing, or a grizzled veteran, hoof will help you get started:

* Importing and running a grammar's parser, lexer, and tree visitor.
* Using a declarative syntax to quickly create ASTs.
* Progressing to tree shaping more tricky ones.
* Quick, extensible options for error handling.

I built hoof because I found myself often repeating the same code on projects, and hitting the same surprises.
Building, debugging, and shaping the output of parsers can seem daunting--hoof makes it a little easier to get started!

## Install

```
pip install hoof
```




Jump to Example...


Parsing text

        
        
        
        
        
        
        
  

Create a simple AST

        
        
        
        
        
        
        
  

Create and run the python AST

        
        
        
        
        
        
        
  



## Example: Parsing text

First, let's consider grammar in hoof_examples/Tiny.

```antlr4
grammar Tiny;

prog: (body=expr ';')* ;

expr: OP? expr # UnaryExpr
| INT # Integer
;

OP : [-+] ;
INT : [0-9]+ ;
```

```{python}
from hoof import Hoof, to_symbol

tiny_lang = Hoof("hoof_examples.Tiny")
tree1 = tiny_lang.parse("-1", "expr", mode = "parser")
tree1
```

Note that UnaryExprContext is a antlr4.RuleNode. These let us examine parts of the parse tree, but can be hard to investigate. A nice way to get a feel for what's going on is by using hoof's `to_symbol` function.

```{python}
to_symbol(tree1)
```

The names next to the black boxes show us what rules were matched. The first was UnaryExpr from the first line in our grammar rule, "expr".

```
expr: OP? expr # UnaryExpr
| INT # Integer
;
```

We can also check the full text for a node

```{python}
tree1.getText()
```

### Moving down the tree

The UnaryExpr context matches an `OP` and `expr`. We can get the matches for these two things individually.

```{python}
op = tree1.OP()
expr = tree1.expr()

op
```

```{python}
(op.getText(), expr.getText())
```

We can also get all of its children.

```{python}
[child.getText() for child in tree1.children]
```

### Rules get matched recursively

```{python}
tree2 = tiny_lang.parse("+-1", "expr", mode = "parser")
to_symbol(tree2)
```

```{python}
print('outer UnaryExpr:', tree2.getText())
print('inner UnaryExpr:', tree2.expr().getText())
```

### Multiple matches are stored in lists

Now let's look at the "prog" rule:

```
prog: (body=expr ';')* ;
```

The `*` means that it can match the piece in the parentheses 0 or more times.

```{python}
tree2 = tiny_lang.parse("1;-2;", "prog", mode = "parser")
to_symbol(tree2)
```

```{python}
# tree2.expr() is a list
for ii, expr in enumerate(tree2.expr()):
print("Expression", ii, "----")
print(to_symbol(expr))
print()
```

## Example: Create a simple AST

```antlr4
grammar Tiny;

prog: (body=expr ';')* ;

expr: OP? expr # UnaryExpr
| INT # Integer
;

SUB : [-+] ;
INT : [0-9]+ ;
```

```{python}
from hoof import Hoof, AntlrAst, to_symbol

tiny_lang = Hoof("hoof_examples.Tiny")

class UnaryOp(AntlrAst):
_fields = ('op', 'expr')
_remap = ['OP->op']
_rules = 'UnaryExpr'

class Visitor(tiny_lang.Visitor):
def visitTerminal(self, ctx):
return ctx.getText()

tiny_lang.register(UnaryOp)
tiny_lang.bind(Visitor)

ast_tree = tiny_lang.parse("-+1", "expr")
ast_tree
```

```{python}
ast_tree.op
```

## Example: Create an executable python AST

**TODO: handle issues. (1) convert strings to ints, (2) labels on single tokens not applied**

```antlr4
grammar Tiny;

prog: (body=expr ';')* ;

expr: OP? expr # UnaryExpr
| INT # Integer
;

SUB : [-+] ;
INT : [0-9]+ ;
```

```{python}
from hoof import Hoof, AntlrAst, to_symbol, DispatchError
import ast

tiny_lang = Hoof("hoof_examples.Tiny")
tree1 = tiny_lang.parse("-1", "expr", mode = "parser")
to_symbol(tree1, explicit = True)
```

```{python}
print("Parsed Python AST")
to_symbol(ast.parse('-1', mode = "eval"))
```

```{python}
py_ast = ast.UnaryOp(op = ast.USub(), operand = ast.Num(n = 1))

print("Manual Python AST")
to_symbol(py_ast)
```

```{python}
from hoof import TokenDispatcher

OP_TO_AST = {
'-': ast.USub,
'+': ast.UAdd
}

td = TokenDispatcher(tiny_lang.Parser)
td.register("OP", lambda ctx: OP_TO_AST[ctx.getText()]())
td.register("INT", lambda ctx: int(ctx.getText()))

class Visitor(tiny_lang.Visitor):

def visitTerminal(self, ctx):
try:
return td(ctx)
except DispatchError:
return ctx.getText()

tiny_lang.register("UnaryExpr", ast.UnaryOp, ["OP->op", "expr->operand"])
tiny_lang.register("Integer", ast.Num, ["INT->n"])

# new visitor saved as tiny_lang.HoofVisitor
tiny_lang.bind(Visitor)

py_ast2 = tiny_lang.parse("-1", "expr", mode = "ast")
```

```{python}
ast.dump(py_ast2)
```

## Common Issues

### Misnamed visitor methods

E.g. visitTerm when you meant visitTerminal

### Surprising quirks in parse tree

* Label over single token not used
* the default visitor is visitChildren, which uses defaultResult. defaultResult -> None.

## Generate parsers

```
docker run --rm -v $(pwd):/usr/src/app antlr /bin/bash -c "antlr4 -Dlanguage=Python3 -visitor tests/Expr.g4"
```