https://github.com/kaby76/Trash
Toolkit for grammars
https://github.com/kaby76/Trash
antlr antlr4 refactoring transformation xpath
Last synced: 8 months ago
JSON representation
Toolkit for grammars
- Host: GitHub
- URL: https://github.com/kaby76/Trash
- Owner: kaby76
- License: mit
- Created: 2021-04-01T18:35:10.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-06-28T17:51:19.000Z (over 1 year ago)
- Last Synced: 2024-06-28T18:10:56.163Z (over 1 year ago)
- Topics: antlr, antlr4, refactoring, transformation, xpath
- Language: C#
- Homepage:
- Size: 28.8 MB
- Stars: 71
- Watchers: 5
- Forks: 5
- Open Issues: 111
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Trash
[](https://github.com/kaby76/Trash/actions?query=workflow%3ACI)
**Status: The toolset is still undergoing a major rewrite. Consider this toolkit as "pre-alpha".
Old tools are being removed, and new ones are being added. Features are being added, while bugs
are constantly being fixed. The XPath/XQuery engine is still being
rewritten.****The repo [g4-scripts](https://github.com/kaby76/g4-scripts) contains a collections of
Bash which use Trash. The repo also contains XQuery scripts that implement complex
operations on a parse tree. You can also
read about Trash details in [my blog](http://codinggorilla.com/).**Trash is a collection of ~40 command-line tools to analyze and transform
Antlr parse trees and grammars. The toolkit can: generate a parser
application for an Antlr4 grammar for any target and any OS; analyze the
grammar for common problems; automate changes applied to a grammar scraped
from a specification; transform parse trees for transpilating
and proprocessing source code. With the [Antlr toolkit](https://www.antlr.org/)
and the [collection of Antlr grammars](https://github.com/antlr/grammars-v4),
one can write programming language tools quickly and easily.The toolkit is designed around a JSON representation of
parse trees and command-line tools that read, modify, and write
those tree via standard input and output. Complex refactorings can be
achieved by chaining different commands together.Each app in `Trash` is implemented as a [Dotnet Tool](https://docs.microsoft.com/en-us/dotnet/core/tools/global-tools) console application, and can be used on Windows, Linux, or Mac.
No prerequisites are required other than installing the
[NET SDK](https://dotnet.microsoft.com/), and the toolchains
for any other targets you want to use.The toolkit uses [Antlr](https://www.antlr.org/) and
[XPath2](https://en.wikipedia.org/wiki/XPath).
The code is implemented in C#.An application of the toolkit was used to scrape and refactor the Dart2
grammar from spec. See [this script](https://github.com/kaby76/ScrapeDartSpec/blob/master/refactor.sh).## Installation
### Requirements
[Install Dotnet 8.0.x](https://dotnet.microsoft.com/en-us/download)### Install Globally
Copy this script and execute it in a command-line prompt.
```
dotnet tool install -g trcaret
dotnet tool install -g trclonereplace
dotnet tool install -g trcombine
dotnet tool install -g trconvert
dotnet tool install -g trcover
dotnet tool install -g trfoldlit
dotnet tool install -g trgen
dotnet tool install -g trgenvsc
dotnet tool install -g trglob
dotnet tool install -g triconv
dotnet tool install -g tritext
dotnet tool install -g trjson
dotnet tool install -g trparse
dotnet tool install -g trperf
dotnet tool install -g trquery
dotnet tool install -g trrename
dotnet tool install -g trsort
dotnet tool install -g trsplit
dotnet tool install -g trsponge
dotnet tool install -g trtext
dotnet tool install -g trtokens
dotnet tool install -g trtree
dotnet tool install -g trunfold
dotnet tool install -g trwdog
dotnet tool install -g trxml
dotnet tool install -g trxml2```
### Uninstall
```
dotnet tool uninstall -g trcaret
dotnet tool uninstall -g trclonereplace
dotnet tool uninstall -g trcombine
dotnet tool uninstall -g trconvert
dotnet tool uninstall -g trcover
dotnet tool uninstall -g trfoldlit
dotnet tool uninstall -g trgen
dotnet tool uninstall -g trgenvsc
dotnet tool uninstall -g trglob
dotnet tool uninstall -g triconv
dotnet tool uninstall -g tritext
dotnet tool uninstall -g trjson
dotnet tool uninstall -g trparse
dotnet tool uninstall -g trperf
dotnet tool uninstall -g trrename
dotnet tool uninstall -g trsort
dotnet tool uninstall -g trsplit
dotnet tool uninstall -g trsponge
dotnet tool uninstall -g trtext
dotnet tool uninstall -g trtokens
dotnet tool uninstall -g trtree
dotnet tool uninstall -g trunfold
dotnet tool uninstall -g trwdog
dotnet tool uninstall -g trxml
dotnet tool uninstall -g trxml2```
### Install Locally
```
dotnet new tool-manifest
dotnet tool install trcaret
dotnet tool install trclonereplace
dotnet tool install trcombine
dotnet tool install trconvert
dotnet tool install trcover
dotnet tool install trfoldlit
dotnet tool install trgen
dotnet tool install trgenvsc
dotnet tool install trglob
dotnet tool install triconv
dotnet tool install tritext
dotnet tool install trjson
dotnet tool install trparse
dotnet tool install trperf
dotnet tool install trquery
dotnet tool install trrename
dotnet tool install trsort
dotnet tool install trsplit
dotnet tool install trsponge
dotnet tool install trtext
dotnet tool install trtokens
dotnet tool install trtree
dotnet tool install trunfold
dotnet tool install trwdog
dotnet tool install trxgrep
dotnet tool install trxml
dotnet tool install trxml2```
## List of commands
__NB: Out of date__
1) tranalyze -- Analyze a grammar
1) trcombine -- Combine a split Antlr4 grammar
1) trconvert -- Convert a grammar from one for to another
1) trdot -- Print a parse tree in Graphvis Dot format
1) trenum -- Not functional, to enumerate strings from grammar.
1) trfirst -- Outputs first sets of a grammar
1) trfold -- Perform fold transform on a grammar
1) trfoldlit -- Perform fold transform on grammar with literals
1) trformat -- Format a grammar
1) trgen -- Generate an Antlr4 parser for a given target language
1) trgen2 -- Generate files from template and XML doc list.
1) trgroup -- Perform a group transform on a grammar
1) tritext -- Get strings from a PDF file
1) trjson -- Print a parse tree in JSON structured format
1) trkleene -- Perform a Kleene transform of a grammar
1) trmove -- Move nodes in a parse tree
1) trparse -- Parse a grammar or use generated parse to parse input
1) trperf -- Perform performance analysis of an Antlr grammar parse
1) trpiggy -- Perform a parse tree rewrite
1) trprint -- Print a parse tree, including off-token characters
1) trrename -- Rename symbols in a grammar
1) trrr -- (No description.)
1) trrup -- Remove useless parentheses in a grammar
1) trsem -- Read static semantics and generate code
1) trsort -- Sort rules in a grammar
1) trsplit -- Split a combined Antlr4 grammar
1) trsponge -- Extract parsing results output of Trash command into files
1) trst -- Print a parse tree in Antlr4 ToStringTree()
1) trstrip -- Strip a grammar of all actions, labels, etc.
1) trtext -- Print a parse tree with a specific interval
1) trthompson -- (No description.)
1) trtokens -- Print tokens in a parse tree
1) trtree -- Print a parse tree in a human-readable format
1) trull -- Transform a grammar with upper- and lowercase string literals
1) trunfold -- Perform an unfold transform on a grammar
1) trungroup -- Perform an ungroup transform on a grammar
1) trwdog -- Kill a program that runs too long
1) trxml -- Print a parse tree in XML structured format
1) trxml2 -- Print an enumeration of all paths in a parse tree to leaves## Examples
### Parse a grammar, create a parser for the grammar, build, and test
```
git clone https://github.com/antlr/grammars-v4
cd grammars-v4/python/python
trparse *.g4 | trquery 'grep //grammarDecl' | trtext
# Output:
# PythonLexer.g4:lexer grammar PythonLexer;
# PythonParser.g4:parser grammar PythonParser;
trgen
cd Generated
dotnet build
cat - < new-source.g4
trparse Arithmetic.g4 | trrename -r "expression,expression_;atom,atom_;scientific,scientific_" | trprintIn these two examples, the Arithmetic grammar is parsed.
[trrename](https://github.com/kaby76/Trash/tree/main/src/trrename) reads the parse tree data and
modifies it by renaming the `expression` symbol two ways: first by XPath expression identifying the LHS terminal
symbol of the `expression` symbol, and the second by assumption that the tree is an Antlr4 parse tree,
then renaming a semi-colon-separated list of paired renames. The resulting code is reconstructed and saved.
`trrename` does not rename symbols in actions, nor does it rename identifiers corresponding to the
grammar symbols in any support source code (but it could if the tool is extended).### Count method declarations in a Java source file
git clone https://github.com/antlr/grammars-v4.git; \
cd grammars-v4/java/java9; \
trgen; dotnet build Generated/Test.csproj;\
trparse examples/AllInOne8.java | trquery "greap //methodDeclaration" | trst | wcThis command clones the Antlr4 grammars-v4 repo, generates a parser for the Java9 grammar,
then runs the parser on [examples/AllInOne8.java](https://github.com/antlr/grammars-v4/blob/master/java/java9/examples/AllInOne8.java).
The parse tree is then piped to `trquery` to find all parse tree nodes that are
a `methodDeclaration` type, converts it to a simple string, and counts the result using
`wc`.### Strip a grammar of all non-essential CFG
trparse Java9.g4 | trstrip | trtext > Essential-Java9.g4
### Split a grammar
Since Antlr2, one can written a combined parser/lexer in one file,
or a split parser/lexer in two files.
While it's not hard to split or combine
a grammar, it's tedious. For automating transformations, it's
necessary because Antlr4 requires the grammars to be split
when super classes are needed for different targets.trcombine ArithmeticLexer.g4 ArithmeticParser.g4 | trprint > Arithmetic.g4
This command calls [trcombine](https://github.com/kaby76/Trash/tree/main/src/trcombine)
which parses two split grammar files
[ArithmeticLexer.g4](https://github.com/kaby76/Trash/blob/main/_tests/combine/ArithmeticLexer.g4)
and
[ArithmeticParser.g4](https://github.com/kaby76/Trash/blob/main/_tests/combine/ArithmeticParser.g4),
and creates a [combined grammar](https://github.com/kaby76/Trash/blob/main/_tests/combine/Arithmetic.g4)
for the two.trparse Arithmetic.g4 | trsplit | trsponge -o true
This command calls [trsplit](https://github.com/kaby76/Trash/tree/main/src/trsplit)
which splits the grammar into two parse tree results, one that defines
ArithmeticLexer.g4 and the other that defines ArithmeticParser.g4.
The tool [trsponge](https://github.com/kaby76/Trash/tree/main/src/trsponge)
is similar to the [tee](https://en.wikipedia.org/wiki/Tee_(command)) in
Linux: the parse tree data is split and placed in files.## Parsing Result Sets -- the data passed between commands
A *parsing result set* is a JSON serialization of an array of:
* A set of parse tree nodes.
* Parser information related to the parse tree nodes.
* Lexer information related to the parse tree nodes.
* The name of the input corresponding to the parse tree nodes.
* The input text corresponding to the parse tree nodes.Most commands in Trash read and/or write parsing result sets.
## Supported grammars
| Grammars | File suffix |
| ---- | ---- |
| Antlr4 | .g4 |
| Antlr3 | .g3 |
| Antlr2 | .g2 |
| Bison | .y |
| LBNF | .cf |
| W3C EBNF | .ebnf |
| ISO 14977 | .iso14977, .iso |## Analysis
### Recursion
* [Has direct/indirect recursion](https://github.com/kaby76/Trash/blob/main/doc/analysis.md#has-directindirect-recursion)
## Refactoring
Trash provides a number of transformations that can help to make grammars cleaner (reformatting),
more readable (reducing the length of the RHS of a rule),
and more efficient (reducing the number of non-terminals) for Antlr.Some of these refactorings are very specific for Antlr due to the way
the parser works, e.g., converting a prioritized chain of productions recognizing
an arithmetic expression to a recursive alternate form.
The refactorings implemented are:### Raw tree editing
* [Delete parse tree node](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#delete-parse-tree-node)
### Reordering
* [Move start rule to top](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#move-start-rule)
* [Reorder parser rules](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#reorder-parser-rules)
* [Sort modes](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#sort-modes)### Changing rules
* [Remove useless parentheses](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#remove-useless-parentheses)
* [Remove useless parser rules](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#remove-useless-productions)
* [Rename lexer or parser symbol](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#rename)
* [Unfold](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#Unfold)
* [Group alts](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#group-alts)
* [Ungroup alts](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#ungroup-alts)
* [Upper and lower case string literals](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#upper-and-lower-case-string-literals)
* [Fold](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#Fold)
* Replace direct left recursion with right recursion
* [Replace direct left/right recursion with Kleene operator](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#Kleene)
* Replace indirect left recursion with right recursion
* Replace parser rule symbols that conflict with Antlr keywords
* [Replace string literals in parser with lexer symbols](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#replace-literals-in-parser-with-lexer-token-symbols)
* Replace string literals in parser with lexer symbols, with lexer rule create
* [Delabel removes the annoying and mostly useless labeling in an Antlr grammar](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#delabel)### Splitting and combining
* [Split combined grammars](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#splitting-and-combining-grammars)
* [Combine splitted grammars](https://github.com/kaby76/Trash/blob/main/doc/refactoring.md#splitting-and-combining-grammars)## Conversion
* [Antlr3 import](https://github.com/kaby76/Trash/blob/main/doc/Import.md#antlr3)
* [Antlr2 import](https://github.com/kaby76/Trash/blob/main/doc/Import.md#antlr2)
* [Bison import](https://github.com/kaby76/Trash/blob/main/doc/Import.md#bison)---------
The source code for the extension is open source, free of charge, and free of ads. For the latest developments on the extension,
check out my [blog](http://codinggorilla.com).# Building
git clone https://github.com/kaby76/Trash
cd Trash
make clean; make; make install
You must have the NET SDK version 8 installed to build and run.# Releases
See https://github.com/kaby76/Trash/releases.
If you have any questions, email me at ken.domino gmail.com