https://github.com/thautwarm/rbnf.hs
Cross-language context-sensitive parsing with type inference, left recursion resolutions and decision tree optimizations.
https://github.com/thautwarm/rbnf.hs
parser-generator
Last synced: 6 months ago
JSON representation
Cross-language context-sensitive parsing with type inference, left recursion resolutions and decision tree optimizations.
- Host: GitHub
- URL: https://github.com/thautwarm/rbnf.hs
- Owner: thautwarm
- License: bsd-3-clause
- Created: 2019-02-18T02:59:30.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-02-05T00:16:17.000Z (about 6 years ago)
- Last Synced: 2025-05-07T03:35:27.433Z (9 months ago)
- Topics: parser-generator
- Language: Haskell
- Homepage:
- Size: 584 KB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog.md
- License: LICENSE
Awesome Lists containing this project
README
# RBNF
Due to the obstacle of individually maintaining type inference framework and corresponding IRs,
I now rule out current implementation for OCaml back end.
Instead, a plan to support following back ends is established.
- Julia(source code)
- Python(bytecode, by [sijuiacion-ir](https://github.com/RemuLang/sijuiacion-lang))
- Ruby(source code)
- Lua(source code)
## Usage
```
Usage: [-v] [-h] [-in filename] [-out filename]
[-be python|ocaml|marisa(default)]
[-k lookahead number] [--trace : codegen with tracebacks.]
[--noinline : might be useful when viewing generated code]
[--jsongraph : dump parsing graph to JSON format]
[--stoppablelr: (not recommended but faster)
allow rollbacks during proceeding the left recursion branch]
# gen python
rbnf-pgen -in .rbnf -k -out -be python [--codegen-with-trace] --noinline
# list out all terminals of the given grammar
rbnf-lex -in .rbnf -out
```
## Repo Structure
- RBNF.IRs: a bunch of IRs useful for generating codes for various backend
- Cirno: not for codegen, used before generating parsing graphs, to analyse parsing semantics.
It will be simply interpreted in a stack-based virtual machine(a simple abstract interpreter).
- Marisa: the first layer which is generated from the parsing graphs and user settings.
It lacks of declarations, and is untyped, however sufficient for dynamic languages to
do codegen.
- RBNF.BackEnds
- Pyrrha: code generator targeting Python back end
- to be continue
## Example
```
Number ::= number
Factor ::= Number | "-" !a=Factor
Mul ::= ?always_true Factor
| Mul "*" Factor
```
You can configure the generator to specify whether to generate
codes with error reports.
## Note: Front End
### .rbnf
```
exp ::= "if" cond=exp "then"
t=exp
"else" f=exp
-> If(cond, t, f);
exp ::= ...
```
Line comments are supported(`# ...`), but not very fancy.
The `.rbnf` is really weak, however, with a simple bootstrap and adding some syntax sugars,
it can be as powerful as OCaml Menhir.
Check out the capabilities of `.exrbnf` in [rbnf-rts](https://github.com/thautwarm/rbnf-rts),
and you can find the fastest generated JSON parser by RBNF at [rbnfjson](https://github.com/thautwarm/rbnf-rts/blob/master/test/rbnfjson/json.exrbnf).
### Top Level Combinators in Haskell
RBNF accepts a `C` datum(given below), and produces efficient generated parser.
```haskell
data C
= CTerm String
| CNonTerm String
| CSeq [C]
| CAlt [C]
| COpt C
-- advanced:
| CBind String C
| CPred MiniLang
| CModif MiniLang -- modify current context
deriving (Eq, Ord, Generic)
type CRule = C
type CProd = (String, C, Maybe MiniLang)
data MiniLang
= MTerm String
| MApp MiniLang [MiniLang]
deriving (Eq, Ord, Generic)
```
## Note: Left Recursion Handling

# Note: Lookahead decision trees(by ID3 algorithm)
```
======== Node1 ========
LAShift (Case "-")
LAShift (Case "-")
LAShift (Case "-")
[6]
LAShift (Case "number")
[6]
LAShift (Case "number")
[6]
LAShift (Case "number")
[12]
--- LA optimization:
case elts[0]
LAShift (Case "-") => [6]
LAShift (Case "number") => [12]
======== Node19 ========
LAShift (Case "-")
LAShift (Case "-")
LAShift (Case "-")
[20]
LAShift (Case "number")
[20]
LAShift (Case "number")
LAShift (Case "*")
[20]
LAShift (Case "number")
LAShift (Case "*")
LAShift (Case "-")
[28]
LAShift (Case "number")
[28]
--- LA optimization:
case elts[1]
LAShift (Case "*") => [28]
LAShift (Case "-") => [20]
LAShift (Case "number") => [20]
======== Node38 ========
LAShift (Case "-")
LAShift (Case "-")
LAShift (Case "-")
[39]
LAShift (Case "number")
[39]
LAShift (Case "number")
LAShift (Case "*")
[39]
LAShift (Case "number")
LAShift (Case "*")
[48]
--- LA optimization:
case elts[1]
LAShift (Case "*") => [48]
LAShift (Case "-") => [39]
LAShift (Case "number") => [39]
```