Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/ara3d/parakeet

A fast and simple .NET parsing library
https://github.com/ara3d/parakeet
Last synced: 3 months ago
JSON representation
A fast and simple .NET parsing library
Host: GitHub
URL: https://github.com/ara3d/parakeet
Owner: ara3d
License: mit
Created: 2018-11-25T18:34:37.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-03-19T18:35:17.000Z (3 months ago)
Last Synced: 2024-03-19T18:55:31.338Z (3 months ago)
Language: C#
Size: 13.5 MB
Stars: 25
Watchers: 5
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists

awesome-dotnet - Parakeet - A recursive descent parsing library with operator overloading for C#. (Parser Library)
extra-awesome-dotnet - Parakeet
README

        # Parakeet

[![NuGet Version](https://img.shields.io/nuget/v/Ara3D.Parakeet)](https://www.nuget.org/packages/Ara3D.Parakeet)

**Parakeet** is a text parsing library written in C#. Parakeet is the parsing library being used by the 

[Plato programming language project](https://github.com/cdiggins/plato) to Parse both Plato and C# source code. 

![Parakeet1](https://user-images.githubusercontent.com/1759994/222930131-4edeb2ce-757f-4471-8905-8c24ecbc67f8.png)

[Image by Hugo Wai](https://unsplash.com/photos/MEborZA-3Ps)

## Overview 

Parakeet is a [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser) (RD) parsing library based on the 

[parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEG) formalization

introduced by [Bryan Ford](https://bford.info/pub/lang/peg.pdf). Parakeet parsers are defined directly in C# using operator overloading. 

Parakeet combines both lexical analysis (aka tokenization) and syntactic analysis in a single pass. 

See this [CodeProject article](https://www.codeproject.com/Articles/5379232/Introduction-to-Text-Parsing-in-Csharp-using-Parak)

for an introduction to the core concepts of Parakeet. 

## More Details and Features

Parakeet was designed primarily for the challenge of parsing programming languages. It can be used in different contexts of course. 

Parakeet supports:

* Parsing error recovery  

* Run-time detection of stuck parsers 

* Line number and column number reporting 

* Immutable data structures 

* Operator overloading

* Fluent API syntax (aka method chaining)

* Automated creation of untyped parse trees

* Code generation for converting untyped parse trees into a strongly typed concrete syntax tree (CST). 

## Steps 

1. Define a grammar in code: a class with a set of properties that map to rules 

1. Convert the input text into a parser input object 

1. Choose the starting rule of the grammar 

1. Call the match function of the starting rule 

1. Examine the resulting `ParserState` object

1. If the result is `null` the parser failed to match, and failed to recover 

	* Consider adding `OnError` to your grammar

1. Convert  

## Primary Classes

* `ParserInput` - Wraps a string with convenience functions to retrieve row/column, and potentially the source file name. 

Can be implicitly converted from a string 

* `ParserState` - Represents a position in the input and a pointer to the most recently created parse node. 

* `ParserRange` - Contains two `ParserState` objects, one representing the beginning of a range of input and the other the end.

* `ParserNode` - A named node in parse node list.  

* `ParserTree` - A tree structure created from a linked list of `ParserNode` objects

* `ParserCache` - Stores parser errors, and successful parse results to accelerated future lookups. 

* `Rule` - Base class of a parser, provides a match function that accepts a ParserState and a ParserCache.

* `Grammar` - Base class of a collection of parsing rules, usually defined as properties. 

* `CstNode` - Base class of typed parse trees generated from `Grammar` objects. 

* `CstClassBuilder` - Static class with functions for generating `AstNode` classes and factory functions from `ParserTree` objects. 

* `ParserError` - An error created when a `Sequence` fails to match a child rule after an `OnError` rule.

* `ParserException` - This represents an internal parser error which usually results from a grammar mistake.  

## Grammar

A `Grammar` is the base class for collections of parsing rules. Usually each parse rule in a grammar is defined 

as a computed property that either returns a `NodeRule` or a `TokenRule`. By using the functions `Node` and `Token`

to wrap rule definitions, names are automatically assigned to each rule. 

The `GrammarExtensions.cs` file contains a number of helper functions for outputting the definitions of grammars 

or to simplify parsing rules. 

When defining rules it is important that any cyclical reference from a rule to itself uses at least one `RecursiveRule`

in the relationship chain. This prevents stack overflow errors from occuring.

 

## Rules

A Parakeet parser is defined by a class deriving from [`Rule`](https://github.com/cdiggins/parakeet/blob/master/Parakeet/Rule.cs). Some rules are defined by combining rules. 

Those combining rules are called "combinators". 

Ever rule has a single function:

```chsarp

public ParserState Match(ParserState state, ParserCache cache)

```

The `Match` will return `null` if the Rule failed to match, or a `ParserState` object if successful.

### Fluent Syntax for Rules

Rules can be combined using a fluent syntax (aka method chaining). 

* `rule.At()` => `new At(rule)`

* `rule.NotAt()` => `new NotAt(rule)`

* `rule1.Then(rule2)` => `new Sequence(rule1, rule2)`

* `rule1.ThenNot(rule2)` => `new Sequence(rule, rule2.NotAt())`

* `rule.Optional()` => `new Optional(rule)`

* `rule1.Or(rule2)` => `new Choice(rule1, rule2)`

* `rule1.Except(rule2)` => `new Sequence(rule2.NotAt(), rule1)`

* `rule.ZeroOrMore()` => `new ZeroOrMore(rule)`

* `rule.OneOrMore()` => `new Sequence(rule, rule.ZeroOrMoree)`

* `char1.To(char2)` => `new CharRangeRule(char1, char2)`

### Overloaded Operators for Rules

Rules can be combined using the following overloaded operators.

* `rule1 + rule2` => `new SequenceRule(rule1, rule2)`

* `rule1 | rule2` => `new ChoiceRule(rule1, rule2)`

* `!rule` => `new NotAt(rule)`

### Implicit Casts

The following implicit casts are defined for Rules: 

* `Rule(string s)` => `new StringMatchRule(s)`

* `Rule(char c)` => `new CharMatchRule(c)`

* `Rule(char[] cs)` => `new CharSetRule(cs)`

* `Rule(string[] xs)` => `new Choice(xs.Select(x => (Rule)x))`

### Primitive Rules

* `StringMatchRule` - Matches a string

* `AnyCharRule` - Matches any character, but fails at end of the file 

* `CharRangeRule` - Matches any character within a range  

* `CharSetRule` - Matches any character within a set

* `CharMatchRule` - Matches a specific character

### Rule Combinators

Rule combinators combine zero or more rules. 

* `ZeroOrMore` - Tries to match a child rule as many times as possible, returning the original `ParserState` if not successful at least once. 

* `Optional` - Tries to match a child rule exactly once, returning the original `ParserState` if the child rule fails. 

* `Sequence` - Matches a sequence of rules one by one, returning `null` if not successful.

* `Choice` - Matches a collection of rules one by one, until one succeeeds, or `null` if not successful.

* `RecursiveRule` - Matches a child rule defined by a lambda, thus allowing Rule definitions to have cycles.

* `TokenRule` - A rule that just matches it child without creating a node. Used for defining grammars. 

* `NodeRule` - Creates a new `ParserNode` and adds it to the `ParserState`. May also eat whitespace (true by default). 

### Assertion Rules

Several rules never advance the parser state:

* `At` - Returns the original parser state if the child rule succeeds, or null otherwise 

* `NotAt` - Returns the original parser state if the child rule fails, or null otherwise 

* `EndOfInput` - Returns the parser state if at the end of input, or null otherwise 

### Error Handling Rules

* `OnFail` - Used only within `Sequence` rules. Contains a child rule called the recovery rule. 

Will always succeed and return the `ParserState` when matched. If a sequence encounters an `OnFail` error and 

one of the subsequent child rules fails, the parser will then use the recovery rule to try to advance 

to a place where it is likely to be able to continue parsing successfully (e.g. the end of a statement).

## Parse Trees 

Parse trees generated from Parakeet are untyped. A parse node is created whenever a `NodeRule` rule 

successfully matches its child rule against the input. After parsing the list of parse nodes 

is converted into a tree structure. 

## Typed Parse Tree (CST)

A set of classes representing a strongly typed parse tree can be created automatically from a Parakeet grammar. This is called the 

Concrete Syntax Tree. Concrete syntax trees are generated from the `Ara3D.Parakeet.Grammars` project using one of the 

functions in the `Ara3d.Parakeet.Tests` project. 

## Examples of Using Parakeet 

The following projects use Parakeet:

* 

* 

## History 

Parakeet evolved from the [Jigsaw parser](https://www.codeproject.com/Articles/272494/Implementing-Programming-Languages-using-Csharp) 

and applies lessons learned when writing the [Myna parsing library in TypeScript](https://cdiggins.github.io/myna-parser/) 

as well as my first parsing library [YARD](https://www.codeproject.com/Articles/9121/Parsing-XML-in-C-using-the-YARD-Parser

Parakeet is designed to be as fast as possible while retaining a clean and elegant grammar description. 

## Related Work

### C# and F# Parsing Libraries 

* https://github.com/benjamin-hodgson/Pidgin

* https://github.com/sprache/Sprache

* https://github.com/plioi/parsley

* https://github.com/datalust/superpower

* https://github.com/IronyProject/Irony

* https://github.com/teo-tsirpanis/Farkle

* https://github.com/takahisa/parseq

* https://github.com/picoe/Eto.Parse

* https://github.com/b3b00/CSLY

* https://github.com/stephan-tolksdorf/fparsec

### Parser Generator Tools

* https://github.com/dbremner/peg-sharp  

* https://github.com/SickheadGames/TinyPG 

* https://github.com/otac0n/Pegasus

* https://github.com/qwertie/LLLPG-Samples

* https://github.com/antlr

## References

* https://en.wikipedia.org/wiki/Parser_combinator

* https://en.wikipedia.org/wiki/Parsing_expression_grammar

* https://pdos.csail.mit.edu/~baford/packrat/icfp02/packrat-icfp02.pdf

## FAQ 

Q: Isn't a parse tree and concrete syntax tree (CST) the same thing? 

> Yes. Parakeet uses the term parse tree to refer to untyped parse tree, and CST to refer to the typed parse tree.  

Q: What is the difference between a CST and an AST? 

> The CST is generated from parsing the input text as is. 

> An AST is the result of transforming the CST into a form that has the same semantic meaning but is presumably 

> simpler and easier. 

Q: Why isn't Parakeet a code generator from the beginning 

> I find it easier to learn, understand, use, and debug libraries that aren't generated. 

> Writing an extension to Parakeet that generates parser code would not be very hard.  

Q: So why is the CST code generated? 

> Having a strongly typed CST is very beneficial when writing analysis and transformation tools 

> especially for non-trivial grammars like those of programming languages. 

> I don't know of any other way in C# to generate strongly typed libraries other than generating code. 

Q: Isn't parsing library X faster? 

> Maybe. I designed Parakeet to be fast enough for my needs, but I prioritized making it robust and (hopefully) easy to use. 

> See the related work section for other parsing libraries to consider. 

Q: Can you provide some benchmarks? Or implement grammar X? 

> I'm kind of busy getting work done. 

> If you are willing to fund this project, then we should talk: email me at .