Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ara3d/parakeet
A fast and simple .NET parsing library
https://github.com/ara3d/parakeet
Last synced: 3 months ago
JSON representation
A fast and simple .NET parsing library
- Host: GitHub
- URL: https://github.com/ara3d/parakeet
- Owner: ara3d
- License: mit
- Created: 2018-11-25T18:34:37.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-03-19T18:35:17.000Z (3 months ago)
- Last Synced: 2024-03-19T18:55:31.338Z (3 months ago)
- Language: C#
- Size: 13.5 MB
- Stars: 25
- Watchers: 5
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-dotnet - Parakeet - A recursive descent parsing library with operator overloading for C#. (Parser Library)
- extra-awesome-dotnet - Parakeet
README
# Parakeet
[![NuGet Version](https://img.shields.io/nuget/v/Ara3D.Parakeet)](https://www.nuget.org/packages/Ara3D.Parakeet)
**Parakeet** is a text parsing library written in C#. Parakeet is the parsing library being used by the
[Plato programming language project](https://github.com/cdiggins/plato) to Parse both Plato and C# source code.![Parakeet1](https://user-images.githubusercontent.com/1759994/222930131-4edeb2ce-757f-4471-8905-8c24ecbc67f8.png)
[Image by Hugo Wai](https://unsplash.com/photos/MEborZA-3Ps)
## Overview
Parakeet is a [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser) (RD) parsing library based on the
[parsing expression grammar](https://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEG) formalization
introduced by [Bryan Ford](https://bford.info/pub/lang/peg.pdf). Parakeet parsers are defined directly in C# using operator overloading.
Parakeet combines both lexical analysis (aka tokenization) and syntactic analysis in a single pass.See this [CodeProject article](https://www.codeproject.com/Articles/5379232/Introduction-to-Text-Parsing-in-Csharp-using-Parak)
for an introduction to the core concepts of Parakeet.## More Details and Features
Parakeet was designed primarily for the challenge of parsing programming languages. It can be used in different contexts of course.
Parakeet supports:
* Parsing error recovery
* Run-time detection of stuck parsers
* Line number and column number reporting
* Immutable data structures
* Operator overloading
* Fluent API syntax (aka method chaining)
* Automated creation of untyped parse trees
* Code generation for converting untyped parse trees into a strongly typed concrete syntax tree (CST).## Steps
1. Define a grammar in code: a class with a set of properties that map to rules
1. Convert the input text into a parser input object
1. Choose the starting rule of the grammar
1. Call the match function of the starting rule
1. Examine the resulting `ParserState` object
1. If the result is `null` the parser failed to match, and failed to recover
* Consider adding `OnError` to your grammar
1. Convert## Primary Classes
* `ParserInput` - Wraps a string with convenience functions to retrieve row/column, and potentially the source file name.
Can be implicitly converted from a string
* `ParserState` - Represents a position in the input and a pointer to the most recently created parse node.
* `ParserRange` - Contains two `ParserState` objects, one representing the beginning of a range of input and the other the end.
* `ParserNode` - A named node in parse node list.
* `ParserTree` - A tree structure created from a linked list of `ParserNode` objects
* `ParserCache` - Stores parser errors, and successful parse results to accelerated future lookups.
* `Rule` - Base class of a parser, provides a match function that accepts a ParserState and a ParserCache.
* `Grammar` - Base class of a collection of parsing rules, usually defined as properties.
* `CstNode` - Base class of typed parse trees generated from `Grammar` objects.
* `CstClassBuilder` - Static class with functions for generating `AstNode` classes and factory functions from `ParserTree` objects.
* `ParserError` - An error created when a `Sequence` fails to match a child rule after an `OnError` rule.
* `ParserException` - This represents an internal parser error which usually results from a grammar mistake.## Grammar
A `Grammar` is the base class for collections of parsing rules. Usually each parse rule in a grammar is defined
as a computed property that either returns a `NodeRule` or a `TokenRule`. By using the functions `Node` and `Token`
to wrap rule definitions, names are automatically assigned to each rule.The `GrammarExtensions.cs` file contains a number of helper functions for outputting the definitions of grammars
or to simplify parsing rules.When defining rules it is important that any cyclical reference from a rule to itself uses at least one `RecursiveRule`
in the relationship chain. This prevents stack overflow errors from occuring.
## RulesA Parakeet parser is defined by a class deriving from [`Rule`](https://github.com/cdiggins/parakeet/blob/master/Parakeet/Rule.cs). Some rules are defined by combining rules.
Those combining rules are called "combinators".Ever rule has a single function:
```chsarp
public ParserState Match(ParserState state, ParserCache cache)
```The `Match` will return `null` if the Rule failed to match, or a `ParserState` object if successful.
### Fluent Syntax for Rules
Rules can be combined using a fluent syntax (aka method chaining).
* `rule.At()` => `new At(rule)`
* `rule.NotAt()` => `new NotAt(rule)`
* `rule1.Then(rule2)` => `new Sequence(rule1, rule2)`
* `rule1.ThenNot(rule2)` => `new Sequence(rule, rule2.NotAt())`
* `rule.Optional()` => `new Optional(rule)`
* `rule1.Or(rule2)` => `new Choice(rule1, rule2)`
* `rule1.Except(rule2)` => `new Sequence(rule2.NotAt(), rule1)`
* `rule.ZeroOrMore()` => `new ZeroOrMore(rule)`
* `rule.OneOrMore()` => `new Sequence(rule, rule.ZeroOrMoree)`
* `char1.To(char2)` => `new CharRangeRule(char1, char2)`### Overloaded Operators for Rules
Rules can be combined using the following overloaded operators.
* `rule1 + rule2` => `new SequenceRule(rule1, rule2)`
* `rule1 | rule2` => `new ChoiceRule(rule1, rule2)`
* `!rule` => `new NotAt(rule)`### Implicit Casts
The following implicit casts are defined for Rules:
* `Rule(string s)` => `new StringMatchRule(s)`
* `Rule(char c)` => `new CharMatchRule(c)`
* `Rule(char[] cs)` => `new CharSetRule(cs)`
* `Rule(string[] xs)` => `new Choice(xs.Select(x => (Rule)x))`### Primitive Rules
* `StringMatchRule` - Matches a string
* `AnyCharRule` - Matches any character, but fails at end of the file
* `CharRangeRule` - Matches any character within a range
* `CharSetRule` - Matches any character within a set
* `CharMatchRule` - Matches a specific character### Rule Combinators
Rule combinators combine zero or more rules.
* `ZeroOrMore` - Tries to match a child rule as many times as possible, returning the original `ParserState` if not successful at least once.
* `Optional` - Tries to match a child rule exactly once, returning the original `ParserState` if the child rule fails.
* `Sequence` - Matches a sequence of rules one by one, returning `null` if not successful.
* `Choice` - Matches a collection of rules one by one, until one succeeeds, or `null` if not successful.
* `RecursiveRule` - Matches a child rule defined by a lambda, thus allowing Rule definitions to have cycles.
* `TokenRule` - A rule that just matches it child without creating a node. Used for defining grammars.
* `NodeRule` - Creates a new `ParserNode` and adds it to the `ParserState`. May also eat whitespace (true by default).### Assertion Rules
Several rules never advance the parser state:
* `At` - Returns the original parser state if the child rule succeeds, or null otherwise
* `NotAt` - Returns the original parser state if the child rule fails, or null otherwise
* `EndOfInput` - Returns the parser state if at the end of input, or null otherwise### Error Handling Rules
* `OnFail` - Used only within `Sequence` rules. Contains a child rule called the recovery rule.
Will always succeed and return the `ParserState` when matched. If a sequence encounters an `OnFail` error and
one of the subsequent child rules fails, the parser will then use the recovery rule to try to advance
to a place where it is likely to be able to continue parsing successfully (e.g. the end of a statement).## Parse Trees
Parse trees generated from Parakeet are untyped. A parse node is created whenever a `NodeRule` rule
successfully matches its child rule against the input. After parsing the list of parse nodes
is converted into a tree structure.## Typed Parse Tree (CST)
A set of classes representing a strongly typed parse tree can be created automatically from a Parakeet grammar. This is called the
Concrete Syntax Tree. Concrete syntax trees are generated from the `Ara3D.Parakeet.Grammars` project using one of the
functions in the `Ara3d.Parakeet.Tests` project.## Examples of Using Parakeet
The following projects use Parakeet:
*
*## History
Parakeet evolved from the [Jigsaw parser](https://www.codeproject.com/Articles/272494/Implementing-Programming-Languages-using-Csharp)
and applies lessons learned when writing the [Myna parsing library in TypeScript](https://cdiggins.github.io/myna-parser/)
as well as my first parsing library [YARD](https://www.codeproject.com/Articles/9121/Parsing-XML-in-C-using-the-YARD-Parser
Parakeet is designed to be as fast as possible while retaining a clean and elegant grammar description.## Related Work
### C# and F# Parsing Libraries
* https://github.com/benjamin-hodgson/Pidgin
* https://github.com/sprache/Sprache
* https://github.com/plioi/parsley
* https://github.com/datalust/superpower
* https://github.com/IronyProject/Irony
* https://github.com/teo-tsirpanis/Farkle
* https://github.com/takahisa/parseq
* https://github.com/picoe/Eto.Parse
* https://github.com/b3b00/CSLY
* https://github.com/stephan-tolksdorf/fparsec### Parser Generator Tools
* https://github.com/dbremner/peg-sharp
* https://github.com/SickheadGames/TinyPG
* https://github.com/otac0n/Pegasus
* https://github.com/qwertie/LLLPG-Samples
* https://github.com/antlr## References
* https://en.wikipedia.org/wiki/Parser_combinator
* https://en.wikipedia.org/wiki/Parsing_expression_grammar
* https://pdos.csail.mit.edu/~baford/packrat/icfp02/packrat-icfp02.pdf## FAQ
Q: Isn't a parse tree and concrete syntax tree (CST) the same thing?
> Yes. Parakeet uses the term parse tree to refer to untyped parse tree, and CST to refer to the typed parse tree.
Q: What is the difference between a CST and an AST?
> The CST is generated from parsing the input text as is.
> An AST is the result of transforming the CST into a form that has the same semantic meaning but is presumably
> simpler and easier.Q: Why isn't Parakeet a code generator from the beginning
> I find it easier to learn, understand, use, and debug libraries that aren't generated.
> Writing an extension to Parakeet that generates parser code would not be very hard.Q: So why is the CST code generated?
> Having a strongly typed CST is very beneficial when writing analysis and transformation tools
> especially for non-trivial grammars like those of programming languages.
> I don't know of any other way in C# to generate strongly typed libraries other than generating code.Q: Isn't parsing library X faster?
> Maybe. I designed Parakeet to be fast enough for my needs, but I prioritized making it robust and (hopefully) easy to use.
> See the related work section for other parsing libraries to consider.Q: Can you provide some benchmarks? Or implement grammar X?
> I'm kind of busy getting work done.
> If you are willing to fund this project, then we should talk: email me at .