https://github.com/datalust/superpower

A C# parser construction toolkit with high-quality error reporting
https://github.com/datalust/superpower

Last synced: 6 months ago
JSON representation

A C# parser construction toolkit with high-quality error reporting

Host: GitHub
URL: https://github.com/datalust/superpower
Owner: datalust
License: apache-2.0
Created: 2016-09-01T06:29:59.000Z (about 9 years ago)
Default Branch: dev
Last Pushed: 2024-09-30T20:32:19.000Z (about 1 year ago)
Last Synced: 2025-05-13T14:57:43.764Z (6 months ago)
Language: C#
Homepage:
Size: 437 KB
Stars: 1,178
Watchers: 43
Forks: 104
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-dot-dev - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
awsome-dotnet - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
awesome-csharp - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)
awesome-dotnet-cn - Superpower - 有着高质量错误报告的C#解析器构造工具套件。 (解析器库)
fucking-awesome-dotnet - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library / GUI - other)
awesome-dotnet - Superpower - A C# parser construction toolkit with high-quality error reporting (Parser Library)

README

          # Superpower [![Build status](https://ci.appveyor.com/api/projects/status/7bj6if6tyc68urpy?svg=true)](https://ci.appveyor.com/project/datalust/superpower) [![NuGet Version](https://img.shields.io/nuget/vpre/Superpower.svg?style=flat)](https://www.nuget.org/packages/Superpower/) [![Stack Overflow](https://img.shields.io/badge/stackoverflow-superpower-orange.svg)](http://stackoverflow.com/questions/tagged/superpower)

A [parser combinator](https://en.wikipedia.org/wiki/Parser_combinator) library based on

[Sprache](https://github.com/sprache/Sprache). Superpower generates friendlier error messages through its support for

token-driven parsers.

![Logo](https://raw.githubusercontent.com/datalust/superpower/dev/asset/Superpower-White-200px.png)

### What is Superpower?

The job of a parser is to take a sequence of characters as input, and produce a data structure that's easier

for a program to analyze, manipulate, or transform. From this point of view, a parser is just a function from `string`

to `T` - where `T` might be anything from a simple number, a list of fields in a data format, or the abstract syntax

tree of some kind of programming language.

Just like other kinds of functions, parsers can be built by hand, from scratch. This is-or-isn't a lot of fun, depending

on the complexity of the parser you need to build (and how you plan to spend your next few dozen nights and weekends).

Superpower is a library for writing parsers in a declarative style that mirrors

the structure of the target grammar. Parsers built with Superpower are fast, robust, and report precise and

informative errors when invalid input is encountered.

### Usage

Superpower is embedded directly into your C# program, without the need for any additional tools or build-time code

generation tasks.

```shell

dotnet add package Superpower

```

The simplest _text parsers_ consume characters directly from the source text: 

```csharp

// Parse any number of capital 'A's in a row

var parseA = Character.EqualTo('A').AtLeastOnce();

```

The `Character.EqualTo()` method is a built-in parser. The `AtLeastOnce()` method is a _combinator_, that builds a more

complex parser for a sequence of `'A'` characters out of the simple parser for a single `'A'`.

Superpower includes a library of simple parsers and combinators from which more sophisticated parsers can be built:

```csharp

TextParser identifier =

    from first in Character.Letter

    from rest in Character.LetterOrDigit.Or(Character.EqualTo('_')).Many()

    select first + new string(rest);

var id = identifier.Parse("abc123");

Assert.Equal("abc123", id);

```

Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers that use

them.

### Tokenization

Along with text parsers that consume input character-by-character, Superpower supports _token parsers_.

A token parser consumes elements from a list of tokens. A token is a fragment of the input text, tagged with the

kind of item that fragment represents - usually specified using an `enum`:

```csharp

public enum ArithmeticExpressionToken

{

    None,

    Number,

    Plus,

```

A major benefit of driving parsing from tokens, instead of individual characters, is that errors can be reported in

terms of tokens - _unexpected identifier \`frm\`, expected keyword \`from\`_ - instead of the cryptic _unexpected \`m\`_.

Token-driven parsing takes place in two distinct steps:

 1. Tokenization, using a class derived from `Tokenizer`, then

 2. Parsing, using a function of type `TokenListParser`.

```csharp

var expression = "1 * (2 + 3)";

// 1.

var tokenizer = new ArithmeticExpressionTokenizer();

var tokenList = tokenizer.Tokenize(expression);

// 2.

var parser = ArithmeticExpressionParser.Lambda; // parser built with combinators

var expressionTree = parser.Parse(tokenList);

// Use the result

var eval = expressionTree.Compile();

Console.WriteLine(eval()); // -> 5

```

#### Assembling tokenizers with `TokenizerBuilder`

The job of a _tokenizer_ is to split the input into a list of tokens - numbers, keywords, identifiers, operators -

while discarding irrelevant trivia such as whitespace or comments.

Superpower provides the `TokenizerBuilder` class to quickly assemble tokenizers from _recognizers_,

text parsers that match the various kinds of tokens required by the grammar.

A simple arithmetic expression tokenizer is shown below:

```csharp

var tokenizer = new TokenizerBuilder()

    .Ignore(Span.WhiteSpace)

    .Match(Character.EqualTo('+'), ArithmeticExpressionToken.Plus)

    .Match(Character.EqualTo('-'), ArithmeticExpressionToken.Minus)

    .Match(Character.EqualTo('*'), ArithmeticExpressionToken.Times)

    .Match(Character.EqualTo('/'), ArithmeticExpressionToken.Divide)

    .Match(Character.EqualTo('('), ArithmeticExpressionToken.LParen)

    .Match(Character.EqualTo(')'), ArithmeticExpressionToken.RParen)

    .Match(Numerics.Natural, ArithmeticExpressionToken.Number)

    .Build();

```

Tokenizers constructed this way produce a list of tokens by repeatedly attempting to match recognizers 

against the input in top-to-bottom order.

#### Writing tokenizers by hand

Tokenizers can alternatively be written by hand; this can provide the most flexibility, performance, and control,

at the expense of more complicated code. A handwritten arithmetic expression tokenizer is included in the test suite,

and a more complete example can be found [here](https://github.com/serilog/serilog-filters-expressions/blob/dev/src/Serilog.Filters.Expressions/Filters/Expressions/Parsing/FilterExpressionTokenizer.cs).

#### Writing token list parsers

Token parsers are defined in the same manner as text parsers, using combinators to build up more sophisticated parsers

out of simpler ones.

```csharp

class ArithmeticExpressionParser

{

    static readonly TokenListParser Add =

        Token.EqualTo(ArithmeticExpressionToken.Plus).Value(ExpressionType.AddChecked);

        

    static readonly TokenListParser Subtract =

        Token.EqualTo(ArithmeticExpressionToken.Minus).Value(ExpressionType.SubtractChecked);

        

    static readonly TokenListParser Multiply =

        Token.EqualTo(ArithmeticExpressionToken.Times).Value(ExpressionType.MultiplyChecked);

        

    static readonly TokenListParser Divide = 

        Token.EqualTo(ArithmeticExpressionToken.Divide).Value(ExpressionType.Divide);

    static readonly TokenListParser Constant =

            Token.EqualTo(ArithmeticExpressionToken.Number)

            .Apply(Numerics.IntegerInt32)

            .Select(n => (Expression)Expression.Constant(n));

    static readonly TokenListParser Factor =

        (from lparen in Token.EqualTo(ArithmeticExpressionToken.LParen)

            from expr in Parse.Ref(() => Expr)

            from rparen in Token.EqualTo(ArithmeticExpressionToken.RParen)

            select expr)

        .Or(Constant);

    static readonly TokenListParser Operand =

        (from sign in Token.EqualTo(ArithmeticExpressionToken.Minus)

            from factor in Factor

            select (Expression)Expression.Negate(factor))

        .Or(Factor).Named("expression");

    static readonly TokenListParser Term =

        Parse.Chain(Multiply.Or(Divide), Operand, Expression.MakeBinary);

    static readonly TokenListParser Expr =

        Parse.Chain(Add.Or(Subtract), Term, Expression.MakeBinary);

    public static readonly TokenListParser>>

        Lambda = Expr.AtEnd().Select(body => Expression.Lambda>(body));

}

```

### Error messages

The [error scenario tests](https://github.com/datalust/superpower/blob/dev/test/Superpower.Tests/ErrorMessageScenarioTests.cs)

demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the 

tests for some examples.

```csharp

ArithmeticExpressionParser.Lambda.Parse(new ArithmeticExpressionTokenizer().Tokenize("1 + * 3"));

     // -> Syntax error (line 1, column 5): unexpected operator `*`, expected expression.

```

To improve the error reporting for a particular token type, apply the `[Token]` attribute:

```csharp

public enum ArithmeticExpressionToken

{

    None,

    Number,

    [Token(Category = "operator", Example = "+")]

    Plus,

```

### Performance

Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of

allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.

Recent benchmark for parsing a long arithmetic expression:

```ini

Host Process Environment Information:

BenchmarkDotNet.Core=v0.9.9.0

OS=Windows

Processor=?, ProcessorCount=8

Frequency=2533306 ticks, Resolution=394.7411 ns, Timer=TSC

CLR=CORE, Arch=64-bit ? [RyuJIT]

GC=Concurrent Workstation

dotnet cli version: 1.0.0-preview2-003121

Type=ArithmeticExpressionBenchmark  Mode=Throughput  

```

|          Method |      Median |     StdDev | Scaled | Scaled-SD |

|---------------- |----------- |---------- |------ |--------- |

|         Sprache | 283.8618 µs | 10.0276 µs |   1.00 |      0.00 |

| Superpower (Token) |  81.1563 µs |  2.8775 µs |   0.29 |      0.01 |

Benchmarks and results are included in the repository.

**Tips:** if you find you need more throughput: 1) consider a hand-written tokenizer, and 2) avoid the use of LINQ comprehensions and instead use chained combinators like `Then()` and especially `IgnoreThen()` - these allocate fewer delegates (closures) during parsing.

### Examples

Superpower is introduced, with a worked example, in [this blog post](https://nblumhardt.com/2016/09/superpower/).

**Example** parsers to learn from:

* [_JsonParser_](https://github.com/datalust/superpower/tree/dev/sample/JsonParser/Program.cs) is a complete, annotated

 example implementing the [JSON spec](https://json.org) with good error reporting

* [_DateTimeTextParser_](https://github.com/datalust/superpower/tree/dev/sample/DateTimeTextParser) shows how Superpower's text parsers work, parsing ISO-8601 date-times

* [_IntCalc_](https://github.com/datalust/superpower/tree/dev/sample/IntCalc) is a simple arithmetic expresion parser (`1 + 2 * 3`) included in the repository, demonstrating how Superpower token parsing works

* [_Plotty_](https://github.com/SuperJMN/Plotty) implements an instruction set for a RISC virtual machine

* [_tcalc_](https://github.com/nblumhardt/tcalc) is an example expression language that computes durations (`1d / 12m`)

**Real-world** projects built with Superpower:

* [_Serilog.Expressions_](https://github.com/serilog/serilog-expressions) uses Superpower to implement an expression and templating language for structured log events

* The query language of [Seq](https://datalust.co/seq) is implemented using Superpower

* `seqcli` [extraction patterns](https://github.com/datalust/seqcli#extraction-patterns) use Superpower for plain-text log parsing

* [_PromQL.Parser_](https://github.com/djluck/PromQL.Parser) is a parser for the Prometheus Query Language

_Have an example we can add to this list? [Let us know](https://github.com/datalust/superpower/issues/new)._

### Getting help

Please post issues [to the issue tracker](https://github.com/datalust/superpower/issues), or tag your [question on StackOverflow](http://stackoverflow.com/questions/tagged/superpower) with `superpower`.

_The repository's title arose out of a talk_ "Parsing Text: the Programming Superpower You Need at Your Fingertips" _given at [DDD Brisbane](http://dddbrisbane.com/) 2015._

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datalust/superpower

Awesome Lists containing this project

README