https://github.com/datalust/superpower
A C# parser construction toolkit with high-quality error reporting
https://github.com/datalust/superpower
Last synced: 6 months ago
JSON representation
A C# parser construction toolkit with high-quality error reporting
- Host: GitHub
- URL: https://github.com/datalust/superpower
- Owner: datalust
- License: apache-2.0
- Created: 2016-09-01T06:29:59.000Z (about 9 years ago)
- Default Branch: dev
- Last Pushed: 2024-09-30T20:32:19.000Z (about 1 year ago)
- Last Synced: 2025-05-13T14:57:43.764Z (6 months ago)
- Language: C#
- Homepage:
- Size: 437 KB
- Stars: 1,178
- Watchers: 43
- Forks: 104
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dot-dev - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
- awsome-dotnet - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
- awesome-csharp - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
- awesome-dotnet-cn - Superpower - 有着高质量错误报告的C#解析器构造工具套件。 (解析器库)
- fucking-awesome-dotnet - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library / GUI - other)
- awesome-dotnet - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
README
# Superpower [](https://ci.appveyor.com/project/datalust/superpower) [](https://www.nuget.org/packages/Superpower/) [](http://stackoverflow.com/questions/tagged/superpower)
A [parser combinator](https://en.wikipedia.org/wiki/Parser_combinator) library based on
[Sprache](https://github.com/sprache/Sprache). Superpower generates friendlier error messages through its support for
token-driven parsers.

### What is Superpower?
The job of a parser is to take a sequence of characters as input, and produce a data structure that's easier
for a program to analyze, manipulate, or transform. From this point of view, a parser is just a function from `string`
to `T` - where `T` might be anything from a simple number, a list of fields in a data format, or the abstract syntax
tree of some kind of programming language.
Just like other kinds of functions, parsers can be built by hand, from scratch. This is-or-isn't a lot of fun, depending
on the complexity of the parser you need to build (and how you plan to spend your next few dozen nights and weekends).
Superpower is a library for writing parsers in a declarative style that mirrors
the structure of the target grammar. Parsers built with Superpower are fast, robust, and report precise and
informative errors when invalid input is encountered.
### Usage
Superpower is embedded directly into your C# program, without the need for any additional tools or build-time code
generation tasks.
```shell
dotnet add package Superpower
```
The simplest _text parsers_ consume characters directly from the source text:
```csharp
// Parse any number of capital 'A's in a row
var parseA = Character.EqualTo('A').AtLeastOnce();
```
The `Character.EqualTo()` method is a built-in parser. The `AtLeastOnce()` method is a _combinator_, that builds a more
complex parser for a sequence of `'A'` characters out of the simple parser for a single `'A'`.
Superpower includes a library of simple parsers and combinators from which more sophisticated parsers can be built:
```csharp
TextParser identifier =
from first in Character.Letter
from rest in Character.LetterOrDigit.Or(Character.EqualTo('_')).Many()
select first + new string(rest);
var id = identifier.Parse("abc123");
Assert.Equal("abc123", id);
```
Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers that use
them.
### Tokenization
Along with text parsers that consume input character-by-character, Superpower supports _token parsers_.
A token parser consumes elements from a list of tokens. A token is a fragment of the input text, tagged with the
kind of item that fragment represents - usually specified using an `enum`:
```csharp
public enum ArithmeticExpressionToken
{
None,
Number,
Plus,
```
A major benefit of driving parsing from tokens, instead of individual characters, is that errors can be reported in
terms of tokens - _unexpected identifier \`frm\`, expected keyword \`from\`_ - instead of the cryptic _unexpected \`m\`_.
Token-driven parsing takes place in two distinct steps:
1. Tokenization, using a class derived from `Tokenizer`, then
2. Parsing, using a function of type `TokenListParser`.
```csharp
var expression = "1 * (2 + 3)";
// 1.
var tokenizer = new ArithmeticExpressionTokenizer();
var tokenList = tokenizer.Tokenize(expression);
// 2.
var parser = ArithmeticExpressionParser.Lambda; // parser built with combinators
var expressionTree = parser.Parse(tokenList);
// Use the result
var eval = expressionTree.Compile();
Console.WriteLine(eval()); // -> 5
```
#### Assembling tokenizers with `TokenizerBuilder`
The job of a _tokenizer_ is to split the input into a list of tokens - numbers, keywords, identifiers, operators -
while discarding irrelevant trivia such as whitespace or comments.
Superpower provides the `TokenizerBuilder` class to quickly assemble tokenizers from _recognizers_,
text parsers that match the various kinds of tokens required by the grammar.
A simple arithmetic expression tokenizer is shown below:
```csharp
var tokenizer = new TokenizerBuilder()
.Ignore(Span.WhiteSpace)
.Match(Character.EqualTo('+'), ArithmeticExpressionToken.Plus)
.Match(Character.EqualTo('-'), ArithmeticExpressionToken.Minus)
.Match(Character.EqualTo('*'), ArithmeticExpressionToken.Times)
.Match(Character.EqualTo('/'), ArithmeticExpressionToken.Divide)
.Match(Character.EqualTo('('), ArithmeticExpressionToken.LParen)
.Match(Character.EqualTo(')'), ArithmeticExpressionToken.RParen)
.Match(Numerics.Natural, ArithmeticExpressionToken.Number)
.Build();
```
Tokenizers constructed this way produce a list of tokens by repeatedly attempting to match recognizers
against the input in top-to-bottom order.
#### Writing tokenizers by hand
Tokenizers can alternatively be written by hand; this can provide the most flexibility, performance, and control,
at the expense of more complicated code. A handwritten arithmetic expression tokenizer is included in the test suite,
and a more complete example can be found [here](https://github.com/serilog/serilog-filters-expressions/blob/dev/src/Serilog.Filters.Expressions/Filters/Expressions/Parsing/FilterExpressionTokenizer.cs).
#### Writing token list parsers
Token parsers are defined in the same manner as text parsers, using combinators to build up more sophisticated parsers
out of simpler ones.
```csharp
class ArithmeticExpressionParser
{
static readonly TokenListParser Add =
Token.EqualTo(ArithmeticExpressionToken.Plus).Value(ExpressionType.AddChecked);
static readonly TokenListParser Subtract =
Token.EqualTo(ArithmeticExpressionToken.Minus).Value(ExpressionType.SubtractChecked);
static readonly TokenListParser Multiply =
Token.EqualTo(ArithmeticExpressionToken.Times).Value(ExpressionType.MultiplyChecked);
static readonly TokenListParser Divide =
Token.EqualTo(ArithmeticExpressionToken.Divide).Value(ExpressionType.Divide);
static readonly TokenListParser Constant =
Token.EqualTo(ArithmeticExpressionToken.Number)
.Apply(Numerics.IntegerInt32)
.Select(n => (Expression)Expression.Constant(n));
static readonly TokenListParser Factor =
(from lparen in Token.EqualTo(ArithmeticExpressionToken.LParen)
from expr in Parse.Ref(() => Expr)
from rparen in Token.EqualTo(ArithmeticExpressionToken.RParen)
select expr)
.Or(Constant);
static readonly TokenListParser Operand =
(from sign in Token.EqualTo(ArithmeticExpressionToken.Minus)
from factor in Factor
select (Expression)Expression.Negate(factor))
.Or(Factor).Named("expression");
static readonly TokenListParser Term =
Parse.Chain(Multiply.Or(Divide), Operand, Expression.MakeBinary);
static readonly TokenListParser Expr =
Parse.Chain(Add.Or(Subtract), Term, Expression.MakeBinary);
public static readonly TokenListParser>>
Lambda = Expr.AtEnd().Select(body => Expression.Lambda>(body));
}
```
### Error messages
The [error scenario tests](https://github.com/datalust/superpower/blob/dev/test/Superpower.Tests/ErrorMessageScenarioTests.cs)
demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the
tests for some examples.
```csharp
ArithmeticExpressionParser.Lambda.Parse(new ArithmeticExpressionTokenizer().Tokenize("1 + * 3"));
// -> Syntax error (line 1, column 5): unexpected operator `*`, expected expression.
```
To improve the error reporting for a particular token type, apply the `[Token]` attribute:
```csharp
public enum ArithmeticExpressionToken
{
None,
Number,
[Token(Category = "operator", Example = "+")]
Plus,
```
### Performance
Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of
allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.
Recent benchmark for parsing a long arithmetic expression:
```ini
Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Windows
Processor=?, ProcessorCount=8
Frequency=2533306 ticks, Resolution=394.7411 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003121
Type=ArithmeticExpressionBenchmark Mode=Throughput
```
| Method | Median | StdDev | Scaled | Scaled-SD |
|---------------- |----------- |---------- |------ |--------- |
| Sprache | 283.8618 µs | 10.0276 µs | 1.00 | 0.00 |
| Superpower (Token) | 81.1563 µs | 2.8775 µs | 0.29 | 0.01 |
Benchmarks and results are included in the repository.
**Tips:** if you find you need more throughput: 1) consider a hand-written tokenizer, and 2) avoid the use of LINQ comprehensions and instead use chained combinators like `Then()` and especially `IgnoreThen()` - these allocate fewer delegates (closures) during parsing.
### Examples
Superpower is introduced, with a worked example, in [this blog post](https://nblumhardt.com/2016/09/superpower/).
**Example** parsers to learn from:
* [_JsonParser_](https://github.com/datalust/superpower/tree/dev/sample/JsonParser/Program.cs) is a complete, annotated
example implementing the [JSON spec](https://json.org) with good error reporting
* [_DateTimeTextParser_](https://github.com/datalust/superpower/tree/dev/sample/DateTimeTextParser) shows how Superpower's text parsers work, parsing ISO-8601 date-times
* [_IntCalc_](https://github.com/datalust/superpower/tree/dev/sample/IntCalc) is a simple arithmetic expresion parser (`1 + 2 * 3`) included in the repository, demonstrating how Superpower token parsing works
* [_Plotty_](https://github.com/SuperJMN/Plotty) implements an instruction set for a RISC virtual machine
* [_tcalc_](https://github.com/nblumhardt/tcalc) is an example expression language that computes durations (`1d / 12m`)
**Real-world** projects built with Superpower:
* [_Serilog.Expressions_](https://github.com/serilog/serilog-expressions) uses Superpower to implement an expression and templating language for structured log events
* The query language of [Seq](https://datalust.co/seq) is implemented using Superpower
* `seqcli` [extraction patterns](https://github.com/datalust/seqcli#extraction-patterns) use Superpower for plain-text log parsing
* [_PromQL.Parser_](https://github.com/djluck/PromQL.Parser) is a parser for the Prometheus Query Language
_Have an example we can add to this list? [Let us know](https://github.com/datalust/superpower/issues/new)._
### Getting help
Please post issues [to the issue tracker](https://github.com/datalust/superpower/issues), or tag your [question on StackOverflow](http://stackoverflow.com/questions/tagged/superpower) with `superpower`.
_The repository's title arose out of a talk_ "Parsing Text: the Programming Superpower You Need at Your Fingertips" _given at [DDD Brisbane](http://dddbrisbane.com/) 2015._