{"id":21343907,"url":"https://github.com/benjamin-hodgson/pidgin","last_synced_at":"2025-05-13T20:14:50.897Z","repository":{"id":38859294,"uuid":"87320923","full_name":"benjamin-hodgson/Pidgin","owner":"benjamin-hodgson","description":"A lightweight and fast parsing library for C#.","archived":false,"fork":false,"pushed_at":"2025-04-07T10:12:09.000Z","size":3571,"stargazers_count":996,"open_issues_count":17,"forks_count":72,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-04-28T11:52:29.537Z","etag":null,"topics":["csharp","dotnet","dotnet-core","parse","parser","parser-combinators","parsing"],"latest_commit_sha":null,"homepage":"https://www.benjamin.pizza/Pidgin/","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benjamin-hodgson.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-04-05T14:43:32.000Z","updated_at":"2025-04-26T02:57:29.000Z","dependencies_parsed_at":"2024-01-05T20:52:05.190Z","dependency_job_id":"2d7e155e-f5e2-4c62-8458-0c4160f4553a","html_url":"https://github.com/benjamin-hodgson/Pidgin","commit_stats":{"total_commits":467,"total_committers":12,"mean_commits":"38.916666666666664","dds":0.2762312633832976,"last_synced_commit":"0c8ef16e70445e7a43aeea38ee9b366fc3004bb2"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamin-hodgson%2FPidgin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamin-hodgson%2FPidgin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamin-hodgson%2FPidgin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benjamin-hodgson%2FPidgin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benjamin-hodgson","download_url":"https://codeload.github.com/benjamin-hodgson/Pidgin/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254020634,"owners_count":22000755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","dotnet","dotnet-core","parse","parser","parser-combinators","parsing"],"created_at":"2024-11-22T01:16:16.794Z","updated_at":"2025-05-13T20:14:50.854Z","avatar_url":"https://github.com/benjamin-hodgson.png","language":"C#","readme":"Pidgin\n======\n\nA lightweight, fast, and flexible parsing library for C#.\n\nInstalling\n----------\n\nPidgin is [available on Nuget](https://www.nuget.org/packages/Pidgin/). API docs are hosted [on my website](https://www.benjamin.pizza/Pidgin).\n\nTutorial\n--------\n\nThere's a tutorial on using Pidgin to parse a subset of Prolog [on my website](https://www.benjamin.pizza/posts/2019-12-08-parsing-prolog-with-pidgin.html).\n\n### Getting started\n\nPidgin is a _parser combinator library_, a lightweight, high-level, declarative tool for constructing parsers. Parsers written with parser combinators look like a high-level specification of a language's grammar, but they're expressed within a general-purpose programming language and require no special tools to produce executable code. Parser combinators are more powerful than regular expressions - they can parse a larger class of languages - but simpler and easier to use than parser generators like ANTLR.\n\nPidgin's core type, `Parser\u003cTToken, T\u003e`, represents a procedure which consumes an input stream of `TToken`s, and may either fail with a parsing error or produce a `T` as output. You can think of it as:\n\n```csharp\ndelegate T? Parser\u003cTToken, T\u003e(IEnumerator\u003cTToken\u003e input);\n```\n\nIn order to start building parsers we need to import two classes which contain factory methods: `Parser` and `Parser\u003cTToken\u003e`. \n\n```csharp\nusing Pidgin;\nusing static Pidgin.Parser;\nusing static Pidgin.Parser\u003cchar\u003e;  // we'll be parsing strings - sequences of characters. For other applications (eg parsing binary file formats) TToken may be some other type (eg byte).\n```\n\n### Primitive parsers\n\nNow we can create some simple parsers. `Any` represents a parser which consumes a single character and returns that character.\n\n```csharp\nAssert.AreEqual('a', Any.ParseOrThrow(\"a\"));\nAssert.AreEqual('b', Any.ParseOrThrow(\"b\"));\n```\n\n`Char`, an alias for `Token`, consumes a _particular_ character and returns that character. If it encounters some other character then it fails.\n\n```csharp\nParser\u003cchar, char\u003e parser = Char('a');\nAssert.AreEqual('a', parser.ParseOrThrow(\"a\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e parser.ParseOrThrow(\"b\"));\n```\n\n`Digit` parses and returns a single digit character.\n```csharp\nAssert.AreEqual('3', Digit.ParseOrThrow(\"3\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e Digit.ParseOrThrow(\"a\"));\n```\n\n`String` parses and returns a particular string. If you give it input other than the string it was expecting it fails.\n\n```csharp\nParser\u003cchar, string\u003e parser = String(\"foo\");\nAssert.AreEqual(\"foo\", parser.ParseOrThrow(\"foo\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e parser.ParseOrThrow(\"bar\"));\n```\n\n`Return` (and its synonym `FromResult`) never consumes any input, and just returns the given value. Likewise, `Fail` always fails without consuming any input.\n\n```csharp\nParser\u003cchar, int\u003e parser = Return(3);\nAssert.AreEqual(3, parser.ParseOrThrow(\"foo\"));\n\nParser\u003cchar, int\u003e parser2 = Fail\u003cint\u003e();\nAssert.Throws\u003cParseException\u003e(() =\u003e parser2.ParseOrThrow(\"bar\"));\n```\n\n### Sequencing parsers\n\nThe power of parser combinators is that you can build big parsers out of little ones. The simplest way to do this is using `Then`, which builds a new parser representing two parsers applied sequentially (discarding the result of the first).\n\n```csharp\nParser\u003cchar, string\u003e parser1 = String(\"foo\");\nParser\u003cchar, string\u003e parser2 = String(\"bar\");\nParser\u003cchar, string\u003e sequencedParser = parser1.Then(parser2);\nAssert.AreEqual(\"bar\", sequencedParser.ParseOrThrow(\"foobar\"));  // \"foo\" got thrown away\nAssert.Throws\u003cParseException\u003e(() =\u003e sequencedParser.ParseOrThrow(\"food\"));\n```\n\n`Before` throws away the second result, not the first.\n\n```csharp\nParser\u003cchar, string\u003e parser1 = String(\"foo\");\nParser\u003cchar, string\u003e parser2 = String(\"bar\");\nParser\u003cchar, string\u003e sequencedParser = parser1.Before(parser2);\nAssert.AreEqual(\"foo\", sequencedParser.ParseOrThrow(\"foobar\"));  // \"bar\" got thrown away\nAssert.Throws\u003cParseException\u003e(() =\u003e sequencedParser.ParseOrThrow(\"food\"));\n```\n\n`Map` does a similar job, except it keeps both results and applies a transformation function to them. This is especially useful when you want your parser to return a custom data structure. (`Map` has overloads which operate on between one and eight parsers; the one-parser version also has a postfix synonym `Select`.)\n\n```csharp\nParser\u003cchar, string\u003e parser1 = String(\"foo\");\nParser\u003cchar, string\u003e parser2 = String(\"bar\");\nParser\u003cchar, string\u003e sequencedParser = Map((foo, bar) =\u003e bar + foo, parser1, parser2);\nAssert.AreEqual(\"barfoo\", sequencedParser.ParseOrThrow(\"foobar\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e sequencedParser.ParseOrThrow(\"food\"));\n```\n\n`Bind` uses the result of a parser to choose the next parser. This enables parsing of context-sensitive languages. For example, here's a parser which parses any character repeated twice.\n\n```csharp\n/// parse any character, then parse a character matching the first character\nParser\u003cchar, char\u003e parser = Any.Bind(c =\u003e Char(c));\nAssert.AreEqual('a', parser.ParseOrThrow(\"aa\"));\nAssert.AreEqual('b', parser.ParseOrThrow(\"bb\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e parser.ParseOrThrow(\"ab\"));\n```\n\nPidgin parsers support LINQ query syntax. It may be easier to see what the above example does when it's written out using LINQ:\n\n```csharp\nParser\u003cchar, char\u003e parser =\n    from c in Any\n    from c2 in Char(c)\n    select c2;\n```\n\nParsers written like this look like a simple imperative script. \"Run the `Any` parser and name its result `c`, then run `Char(c)` and name its result `c2`, then return `c2`.\"\n\n### Choosing from alternatives\n\n`Or` represents a parser which can parse one of two alternatives. It runs the left parser first, and if it fails it tries the right parser.\n\n```csharp\nParser\u003cchar, string\u003e parser = String(\"foo\").Or(String(\"bar\"));\nAssert.AreEqual(\"foo\", parser.ParseOrThrow(\"foo\"));\nAssert.AreEqual(\"bar\", parser.ParseOrThrow(\"bar\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e parser.ParseOrThrow(\"baz\"));\n```\n\n`OneOf` is equivalent to `Or`, except it takes a variable number of arguments. Here's a parser which is equivalent to the one using `Or` above:\n\n```csharp\nParser\u003cchar, string\u003e parser = OneOf(String(\"foo\"), String(\"bar\"));\n```\n\nIf one of `Or` or `OneOf`'s component parsers fails _after consuming input_, the whole parser will fail.\n\n```csharp\nParser\u003cchar, string\u003e parser = String(\"food\").Or(String(\"foul\"));\nAssert.Throws\u003cParseException\u003e(() =\u003e parser.ParseOrThrow(\"foul\"));  // why didn't it choose the second option?\n```\n\nWhat happened here? When a parser successfully parses a character from the input stream, it advances the input stream to the next character. `Or` only chooses the next alternative if the given parser fails _without consuming any input_; Pidgin does not perform any lookahead or backtracking by default. Backtracking is enabled using the `Try` function.\n\n```csharp\n// apply Try to the first option, so we can return to the beginning if it fails\nParser\u003cchar, string\u003e parser = Try(String(\"food\")).Or(String(\"foul\"));\nAssert.AreEqual(\"foul\", parser.ParseOrThrow(\"foul\"));\n```\n\n### Recursive grammars\n\nAlmost any non-trivial programming language, markup language, or data interchange language will feature some sort of recursive structure. C# doesn't support recursive values: a recursive referral to a variable currently being initialised will return `null`. So we need some sort of deferred execution of recursive parsers, which Pidgin enables using the `Rec` combinator. Here's a simple parser which parses arbitrarily nested parentheses with a single digit inside them.\n\n```csharp\nParser\u003cchar, char\u003e expr = null;\nParser\u003cchar, char\u003e parenthesised = Char('(')\n    .Then(Rec(() =\u003e expr))  // using a lambda to (mutually) recursively refer to expr\n    .Before(Char(')'));\nexpr = Digit.Or(parenthesised);\nAssert.AreEqual('1', expr.ParseOrThrow(\"1\"));\nAssert.AreEqual('1', expr.ParseOrThrow(\"(1)\"));\nAssert.AreEqual('1', expr.ParseOrThrow(\"(((1)))\"));\n```\n\nHowever, Pidgin does not support left recursion. A parser must consume some input before making a recursive call. The following example will produce a stack overflow because a recursive call to `arithmetic` occurs before any input can be consumed by `Digit` or `Char('+')`:\n\n```csharp\nParser\u003cchar, int\u003e arithmetic = null;\nParser\u003cchar, int\u003e addExpr = Map(\n    (x, _, y) =\u003e x + y,\n    Rec(() =\u003e arithmetic),\n    Char('+'),\n    Rec(() =\u003e arithmetic)\n);\narithmetic = addExpr.Or(Digit.Select(d =\u003e (int)char.GetNumericValue(d)));\n\narithmetic.Parse(\"2+2\");  // stack overflow!\n```\n\n### Derived combinators\n\nAnother powerful element of this programming model is that you can write your own functions to compose parsers. Pidgin contains a large number of higher-level combinators, built from the primitives outlined above. For example, `Between` runs a parser surrounded by two others, keeping only the result of the central parser.\n\n```csharp\nParser\u003cTToken, T\u003e InBraces\u003cTToken, T, U, V\u003e(this Parser\u003cTToken, T\u003e parser, Parser\u003cTToken, U\u003e before, Parser\u003cTToken, V\u003e after)\n    =\u003e before.Then(parser).Before(after);\n```\n\n### Parsing expressions\n\nPidgin features operator-precedence parsing tools, for parsing expression grammars with associative infix operators. The `ExpressionParser` class builds a parser from a parser to parse a single expression term and a table of operators with rules to combine expressions.\n\n### More examples\n\nExamples, such as parsing (a subset of) JSON and XML into document structures, can be found in the `Pidgin.Examples` project.\n\nTips\n----\n\n### A note on variance\n\nWhy doesn't this code compile?\n\n```csharp\nclass Base {}\nclass Derived : Base {}\n\nParser\u003cchar, Base\u003e p = Return(new Derived());  // Cannot implicitly convert type 'Pidgin.Parser\u003cchar, Derived\u003e' to 'Pidgin.Parser\u003cchar, Base\u003e'\n```\n\nThis would be possible if `Parser` were defined as a _covariant_ in its second type parameter (ie `interface Parser\u003cTToken, out T\u003e`). For the purposes of efficiency, Pidgin parsers return a struct. Structs and classes aren't allowed to have variant type parameters (only interfaces and delegates); since a Pidgin parser's return value isn't variant, nor can the parser itself.\n\nIn my experience, this crops up most frequently when returning a node of a syntax tree from a parser using `Select`. The least verbose way of rectifying this is to explicitly set `Select`'s type parameter to the supertype:\n\n```csharp\nParser\u003cchar, Base\u003e p = Any.Select\u003cBase\u003e(() =\u003e new Derived());\n```\n\n### Speed tips\n\nPidgin is designed to be fast and produce a minimum of garbage. A carefully written Pidgin parser can be competitive with a hand-written recursive descent parser. If you find that parsing is a bottleneck in your code, here are some tips for minimising the runtime of your parser.\n\n* Avoid LINQ query syntax. Query comprehensions are defined by translation into core C# using `SelectMany`, however, for long queries the translation can allocate a large number of anonymous objects. This generates a lot of garbage; while those objects often won't survive the nursery it's still preferable to avoid allocating them!\n* Avoid backtracking where possible. If consuming a streaming input like a `TextReader` or an `IEnumerable`, `Try` _buffers_ its input to enable backtracking, which can be expensive.\n* Use specialised parsers where possible: the provided `Skip*` parsers can be used when the result of parsing is not required. They typically run faster than their counterparts because they don't need to save the values generated.\n* Build your parser statically where possible. Pidgin is designed under the assumption that parser scripts are executed more than they are written; building a parser can be an expensive operation.\n* Avoid `Bind` and `SelectMany` where possible. Many practical grammars are _context-free_ and can therefore be written purely with `Map`. If you do have a context-sensitive grammar, it may make sense to parse it in a context-free fashion and then run a semantic checker over the result.\n\nComparison to other tools\n-------------------------\n\n### Pidgin vs Sprache\n\n[Sprache](https://github.com/sprache/Sprache) is another parser combinator library for C# and served as one of the sources of inspiration for Pidgin. Pidgin's API is somewhat similar to that of Sprache, but Pidgin aims to improve on Sprache in a number of ways:\n\n* Sprache's input must be a string. This makes it inappropriate for parsing binary protocols or tokenised inputs. Pidgin supports input tokens of arbitrary type.\n* Sprache's input must be a string - an _in-memory_ array of characters. Pidgin supports streaming inputs.\n* Sprache automatically backtracks on failure. Pidgin uses a special combinator to enable backtracking because backtracking can be a costly operation.\n* Pidgin comes bundled with operator-precedence tools for parsing expression languages with associative infix operators.\n* Pidgin is faster and allocates less memory than Sprache.\n* Pidgin has more documentation coverage than Sprache.\n\n### Pidgin vs FParsec\n\n[FParsec](https://github.com/stephan-tolksdorf/fparsec) is a parser combinator library for F# based on [Parsec](https://hackage.haskell.org/package/parsec-3.1.11).\n\n* FParsec is an F# library and consuming it from C# can be awkward. Pidgin is implemented in pure C#, and is designed for C# consumers.\n* FParsec only supports character input streams.\n* FParsec supports stateful parsing - it has an extra type parameter for an arbitrary user-defined state - which can make it easier to parse context-sensitive grammars.\n* FParsec is faster than Pidgin (though we're catching up!)\n\n### Benchmarks\n\nThis is how Pidgin compares to other tools in terms of performance. The benches can be found in the `Pidgin.Bench` project.\n\n```ini\nBenchmarkDotNet=v0.11.5, OS=Windows 10.0.14393.3384 (1607/AnniversaryUpdate/Redstone1)\nIntel Core i5-4460 CPU 3.20GHz (Haswell), 1 CPU, 4 logical and 4 physical cores\nFrequency=3125000 Hz, Resolution=320.0000 ns, Timer=TSC\n.NET Core SDK=3.1.100\n  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT\n  DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT\n```\n\n#### `ExpressionBench`\n\n|              Method |         Mean |        Error |        StdDev | Ratio | RatioSD |  Gen 0 | Gen 1 | Gen 2 | Allocated |\n|-------------------- |-------------:|-------------:|--------------:|------:|--------:|-------:|------:|------:|----------:|\n|   LongInfixL_Pidgin | 625,148.8 ns | 3,015.040 ns | 2,672.7541 ns |  2.25 |    0.01 |      - |     - |     - |     128 B |\n|   LongInfixR_Pidgin | 625,530.1 ns | 4,104.833 ns | 3,839.6633 ns |  2.25 |    0.02 |      - |     - |     - |     128 B |\n|  LongInfixL_FParsec | 278,035.1 ns | 1,231.538 ns | 1,151.9816 ns |  1.00 |    0.00 |      - |     - |     - |     200 B |\n|  LongInfixR_FParsec | 326,047.3 ns |   931.485 ns |   871.3119 ns |  1.17 |    0.01 |      - |     - |     - |     200 B |\n|                     |              |              |               |       |         |        |       |       |           |\n|  ShortInfixL_Pidgin |   1,506.5 ns |     5.515 ns |     5.1590 ns |  2.67 |    0.01 | 0.0401 |     - |     - |     128 B |\n|  ShortInfixR_Pidgin |   1,636.6 ns |     6.882 ns |     5.7467 ns |  2.90 |    0.02 | 0.0401 |     - |     - |     128 B |\n| ShortInfixL_FParsec |     564.1 ns |     1.894 ns |     1.6788 ns |  1.00 |    0.00 | 0.0629 |     - |     - |     200 B |\n| ShortInfixR_FParsec |     567.7 ns |     1.200 ns |     0.9373 ns |  1.01 |    0.00 | 0.0629 |     - |     - |     200 B |\n\n#### `JsonBench`\n\n|              Method |       Mean |     Error |    StdDev | Ratio | RatioSD |     Gen 0 |    Gen 1 | Gen 2 |  Allocated |\n|-------------------- |-----------:|----------:|----------:|------:|--------:|----------:|---------:|------:|-----------:|\n|      BigJson_Pidgin |   684.6 us |  2.888 us |  2.701 us |  1.00 |    0.00 |   33.2031 |        - |     - |   101.7 KB |\n|     BigJson_Sprache | 3,597.5 us | 17.595 us | 16.458 us |  5.25 |    0.03 | 1726.5625 |        - |     - | 5291.81 KB |\n|  BigJson_Superpower | 2,884.4 us |  6.504 us |  5.766 us |  4.21 |    0.02 |  296.8750 |        - |     - |  913.43 KB |\n|     BigJson_FParsec |   750.1 us |  3.516 us |  3.289 us |  1.10 |    0.01 |  111.3281 |        - |     - |  343.43 KB |\n|                     |            |           |           |       |         |           |          |       |            |\n|     LongJson_Pidgin |   517.5 us |  2.418 us |  2.261 us |  1.00 |    0.00 |   33.2031 |        - |     - |  104.25 KB |\n|    LongJson_Sprache | 2,858.5 us | 10.491 us |  9.300 us |  5.53 |    0.03 | 1390.6250 |        - |     - | 4269.33 KB |\n| LongJson_Superpower | 2,348.1 us | 14.194 us | 13.277 us |  4.54 |    0.03 |  230.4688 |        - |     - |  706.79 KB |\n|    LongJson_FParsec |   642.5 us |  2.708 us |  2.533 us |  1.24 |    0.01 |  125.0000 |        - |     - |   384.3 KB |\n|                     |            |           |           |       |         |           |          |       |            |\n|     DeepJson_Pidgin |   399.3 us |  1.784 us |  1.582 us |  1.00 |    0.00 |   26.3672 |        - |     - |   82.24 KB |\n|    DeepJson_Sprache | 2,983.0 us | 42.512 us | 39.765 us |  7.46 |    0.09 |  761.7188 | 191.4063 |     - | 2922.46 KB |\n|    DeepJson_FParsec |   701.8 us |  1.665 us |  1.557 us |  1.76 |    0.01 |  112.3047 |        - |     - |  344.43 KB |\n|                     |            |           |           |       |         |           |          |       |            |\n|     WideJson_Pidgin |   427.8 us |  1.619 us |  1.515 us |  1.00 |    0.00 |   15.6250 |        - |     - |   48.42 KB |\n|    WideJson_Sprache | 1,704.2 us |  9.246 us |  8.196 us |  3.98 |    0.02 |  900.3906 |        - |     - | 2763.22 KB |\n| WideJson_Superpower | 1,494.6 us |  9.581 us |  8.962 us |  3.49 |    0.02 |  148.4375 |        - |     - |  459.74 KB |\n|    WideJson_FParsec |   379.5 us |  1.597 us |  1.494 us |  0.89 |    0.00 |   41.9922 |        - |     - |  129.02 KB |\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenjamin-hodgson%2Fpidgin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenjamin-hodgson%2Fpidgin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenjamin-hodgson%2Fpidgin/lists"}