Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/torao/terp
Parser Combinator Framework for Rust
https://github.com/torao/terp
Last synced: about 2 months ago
JSON representation
Parser Combinator Framework for Rust
- Host: GitHub
- URL: https://github.com/torao/terp
- Owner: torao
- License: mit
- Created: 2022-06-12T06:36:57.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-11-03T08:59:04.000Z (about 2 years ago)
- Last Synced: 2023-03-24T04:58:12.438Z (almost 2 years ago)
- Language: Rust
- Size: 2.46 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Terp
[![github actions](https://github.com/torao/terp/actions/workflows/build.yml/badge.svg)](https://github.com/torao/terp/actions)
[![Coverage Status](https://coveralls.io/repos/github/torao/terp/badge.svg?branch=main)](https://coveralls.io/github/torao/terp?branch=main)**Terp** is a stream-oriented syntactical parser for Rust, capable of sequentially processing fragmented input symbol sequences. This interprets input according to an application-defined syntax and produces a sequence marked up with *begin* and *end* pairs of non-terminal symbols.
## Overview
Terp is implemented for **streaming** or **pipelined** processing, where the processing is performed sequentially form the syntax that could be parsed, without waiting to read the entire fragmented input. This is also useful for read-eval-print loop (REPL) programs, such as interactive processors available on some programming language platforms, that read a line-by-line program fragments and evaluate from a finalised expression, while the unfinalized one waits for the remaining input.
It is also sutaible for an **infinite input streams**, or data with a length that is practically impossible to read into memory (however, the syntax for processing such input must be safely defined to be deterministic state by a practical number of look-aheads).
Another key feature of terp is that instead of matching alternatives using traditional $k$-lookahead prediction or backtracking, matching is done by **parallel evaluation** of parsing paths. It is more suitable for parsing in modern multi-core computer environments.
In the traditional definition, terp would be a variant of the recurisive-descent LL(k) parser, whwich can interpret context-free grammars (CFG). For more information on using terp, see the [Reference Guide](doc/README.md).
## Features
### Easy-to-describe Schema
Instead of using complex function combination, the schema can be described in a BNF or PEG-like manner with better visibility. The following example is a JSON string defined in [RFC 8259](https://www.rfc-editor.org/rfc/rfc8259.html) defined in terp, where `A & B` means that `B` appears after `A`, `A | B` means that `A` or `B` appears, and `A * (X..=Y)` means `X` to `Y` repetitions of `A`.
```rust
let schema = Schema::new("JSON String")
.define("String", id("Quote") & (id("Char") * (0..)) & id("Quote"))
.define("Quote", ch('\"'))
.define("Char", id("Unescaped") | id("Escape") & (one_of_chars("\"\\/bfnrt") | (ch('u') & (id("Hex") * 4))))
.define("Escape", ch('\\'))
.define("Unescaped", range('\x20'..='\x21') | range('\x23'..='\x5B') | range('\x5D'..='\u{10FFFF}'))
.define("Hex", range('0'..='9') | range('a'..='f') | range('A'..='F'));
```The schema is references as immutable while the parser is parsing.
### State-Machine Designed Parser
The parser updates its state for incoming data sequence fragments and sequentially outputs marked-up sequence as events when the meaning is determined (this is similar to the SAX parser in XML). This terp parser behaves like a pipeline, which is useful for streaming processes that read and parse fragmented data from sockets or other inputs.
![Parser Input](doc/input-process-output.png)
Input data sequences will work no matter what delimitations they are fragmented into. The resulting output data sequence are passed as event callbacks.
```rust
let mut events = Vec::new();
let mut parser = Context::new(&schema, "String", |e:Event| events.push(e)).unwrap();
parser.push_str("\"t").unwrap();
parser.push_str("e").unwrap();
parser.push_str("rp\"").unwrap();
parser.finish().unwrap();
println!("{:?}", events);
```The events called back are a sequence marked up with the input sequence by identifiers' BEGIN-END pair. This constitutes a tree structure organized by meaning, similar to the structure of XML.
```
EventKind::Begin("String")
EventKind::Begin("Quote")
EventKind::Fragments("\"")
EventKind::End("Quote")
```* The supported data sequences are abstracted, allowing parsers to be built for strings, byte arrays, or any other data sequence.
* Multiple routes are matched in parallel using [`rayon`](https://github.com/rayon-rs/rayon) framework.
* This is not so fast as dedicated parser implementations optimized for the schema. It is suitable for parsing domain-specific data for which a dedicated parser doesn't exist, or for use as a comparison to see if the dedicated parser is working properly.