https://github.com/i-e-b/gool
A fast, robust, and thread-safe parser-combinator library for C#, with a fluent BNF-like interface
https://github.com/i-e-b/gool
csharp lexer lexer-generator parser parser-combinators
Last synced: 3 months ago
JSON representation
A fast, robust, and thread-safe parser-combinator library for C#, with a fluent BNF-like interface
- Host: GitHub
- URL: https://github.com/i-e-b/gool
- Owner: i-e-b
- License: bsd-3-clause
- Created: 2011-09-21T06:02:52.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2025-09-24T12:50:53.000Z (4 months ago)
- Last Synced: 2025-10-10T08:18:50.885Z (3 months ago)
- Topics: csharp, lexer, lexer-generator, parser, parser-combinators
- Language: C#
- Homepage: https://www.nuget.org/packages/Gool/
- Size: 2.35 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Gool is a lexer/parser for C#
==================================
A fast, robust, and thread-safe parser-combinator library for C#, with a fluent BNF-like interface for building parsers.
Use this to read and interpret a wide range of text-based input -- including file formats, data structures, and
programming languages.
By being a run-time library inside the main program, grammars can be built and modified as required,
with even complex structures taking microseconds to build.
### Unique features
- Contextual parsing: can build new patterns at run-time *and* at parse time
- Patterns can be computed using any C# code
- Easily expanded to handle complex patterns
- Use all your existing navigation and refactoring tools
### When should I use this?
If you have a complex and/or fragile set of regular expressions, try using a parser instead.
See [Sample Parsers](https://github.com/i-e-b/Gool/tree/master/SamplesStd) for fully functional examples.
Basic example
-------------
Defining the parser:
```csharp
BNF // Basic infix arithmetic expressions
number = FractionalDecimal(), // Built-in helper for signed numbers
factor = number | ('(' > _expression > ')'), // Number or parenthesised expression
power = factor > !('^' > factor), // Factor, with optional '^' + exponent
term = power % ('*' | '/'), // Powers, optionally joined with '*' or '/'
expression = term % ('+' | '-'); // Terms, optionally joined will '+' or '-'
```
Reading an input:
```csharp
var result = expression.ParseEntireString( // Run the parser, refuse partial matches
"(6.5 + 3) * (5.5 - -0.2e1)"
);
var tree = TreeNode.FromParserMatch(result, true); // Interpret the raw parse tree
var final = TreeNode.TransformTree(tree, ApplyOperation); // Apply functions to reduce tree to value
Console.WriteLine(final); // 71.25
```
(some details removed for clarity -- see bottom of this readme for full implementation)
BNF Syntax
----------
### Terminal parsers:
- `'…'` → *Character* parser that matches a single literal character in the input
- `"…"` → *String* parser that matches a literal string in the input
- `BNF.Regex("…")` → *Regex* parser that matches a string based on a regex pattern.
- `BNF.OneOf(…)` → Match a single character from the set provided
- `BNF.NoneOf(…)` → Match any single character that is **not** in the set provided
- `BNF.AnyChar` → Parser that matches any single character.
- `BNF.Empty` → Parser that matches an empty string (useful in unions)
- `BNF.EndOfInput` → Parser that matches the end of input (parsers will normally accept partial matches)
- `BNF.LineEnd` → Parser that matches a line end (either `\r`, or `\n`, or `\r\n`)
- `BNF.WhiteSpace` → Parser that matches a single character of white-space
### Combining parsers:
- a `|` b → Create a *union* parser that matches the **longest** result from either **a** or **b**. Parser will match if only one of **a** and **b** match, *or* if both **a** and **b** match.
- Example: `"hello" | "world"` matches `hello` or `world`
- Example: `"on" | "one"` matches `on` and `one`. `+( "on" | "one" )` will match `oneone` as {`one`, `one`}
- a `>` b → Create a *sequence* parser that matches **a** then **b**
- Example: `'x' > 'y'` matches `xy` but not `x` or `y`
- a `<` b → Create a *terminated list* parser that matches a list of **a**, each being terminated by **b**. The last item **a** must be terminated.
- Example: `'x' < ';'` matches `x;x;x;` and `x;`, but not `x` or `x;x`
- a `%` b → Create a *delimited list* parser that matches a list of **a**, delimited by **b**. A trailing delimiter is not matched.
- Example: `'x'%','` matches `x` and `x,x`, but not `x,x,`
- `-`a → Create an *optional repeat* parser that matches zero or more **a**
- Example: `-"xy"` matches `xyxy`, `xy`, and *empty*
- `+`a → Create a *repeat* parser that matches one or more **a**
- Example: `+"xy"` matches `xy` and `xyxy`, but not *empty*
- `!`a → Create an *option* parser that matches zero or one **a**
- Example: `!"xy"` matches `xy` and *empty*, but not `xyxy`
- Can also be expressed `BNF.Optional(`a`)`
- a `&` b → Create an *intersection* parser that matches (**a** then **b**) or (**b** then **a**)
- Example: `'x'&'y'` matches `xy` and `yx`, but not `xx` or `yy`
- a `^` b → Create an *exclusion* parser that matches **a** or **b** but not both
- Example: `'x'^'y'` matches `x` and `y`, but not `xy` or `yx`
- a `/` b → Create a *difference* parser that matches **a** but not **b**
- Example: `"on" / "one"` matches `on` but not `one`. `+( 'x' / '.' )` will match `xx.` as {`x`, `x`}
- `~`a → Create a *non-consuming* parser that must match **a**, but does not consume the match
- Example: `'x' > ~'y' > "yz"` matches `xyz` as `x` and `yz`. This is useful for compatibility with PEG grammars.
- a `>=` b → Creates a *preference-union* (also called *preference-alternative* or *ordered choice*) parser from two sub-parsers. This returns the first successful match from left-to-right
- Example: `"a" >= "app" >= "apple"` would only ever match `a`. This is useful for compatibility with PEG grammars.
Parsers generated by BNF can be used repeatedly.
More Details
------------
### What are Parser-Combinators
Parser-Combinators are components that you can build into structures that encode languages (grammars).
The building can be done using a human-readable syntax, building parsers of increasing complexity on top of simpler parts.
Detailed grammars and languages can be processed in an efficient way.
By structuring the parser-combinator library in a particular way, building parsers is the same as writing a grammar itself.
Therefore instead of describing how to parse a language, a user must only specify the language itself.
The result is a working parser.
The resulting parsers are equivalent to recursive descent parsers with contextual curtailment.
The parsing process can handle left-recursive grammars and ambiguous grammars, although these will
result in less efficient parsing.
### Scanners
Parsers operate over a 'scanner', which is an input string plus transforms and contextual data.
For common cases, you won't need to create one directly -- just use `BNF.ParseString` or `BnfPackage.ParseString`.
Scanners handle case-conversion and white-space skipping if you use those options.
Because scanners hold context for a parse, they cannot be reused or shared between parse attempts.
### Tags, scopes, and trees
The basic output from a parser is a `ParserMatch`, which gives a complete tree of all matches, including those from combined
parsers. `ParserMatch` also gives access to the parser that made the match, and the scanner that was used for input.
The `ParserMatch` tree contains all the information from a result, but often this is too much.
Any parser can be tagged with a string value, and this can be used to extract salient information from the tree.
#### TaggedTokensDepthFirst / TaggedTokensBreadthFirst
You can request a sequence of `ParserMatch`es from the result tree, only returning tagged results.
Tags are **not** inherited by parent matches.
#### ScopeNode.FromMatch
The scoped node tree ignores the `ParserMatch` hierarchy, and uses `.OpenScope()` and `.CloseScope()`
to build a different structure. This is useful for structured data (like JSON and XML) and otherwise scoped
data that use open and close markers -- like (`{`,`}`) or (`begin`,`end`) in many programming languages.
#### TreeNode.FromParserMatch
General tree nodes match the `ParserMatch` hierarchy, but only including nodes with a tag or scope set.
The `Pivot` scope has a specific effect on general trees, 'lifting' them to a level above non-pivot peers.
This is useful for chains of operators:
Given the parser `sum` from:
```csharp
BNF number = BNF.Regex("[0-9]+").Tag("num");
BNF addSub = BNF.OneOf('+', '-').Tag("op");
BNF sum = number % addSub;
```
and the input:
```csharp
var result = sum.ParseEntireString(
"1+2",
);
var tree = TreeNode.FromParserMatch(result, false);
```
outputs `tree` as:
```
┌───── 1
│ +
└── 2
```
but changing `addSub` to `BNF.OneOf('+', '-').Tag("op").PivotScope();` results in
```
┌──1
+│
└──2
```
Detailed examples
-----------------
See [Sample Parsers](https://github.com/i-e-b/Gool/tree/master/SamplesStd) for more fully functional examples.
### Basic infix arithmetic calculator
```csharp
using Gool;
using static Gool.BNF; // Include BNF methods without needing 'BNF.' everywhere
public double EvaluateExpression(string expression)
{
var result = Arithmetic().ParseEntireString(expression); // Step 1: parse input
var tree = TreeNode.FromParserMatch(result, prune: true); // Step 2: build expression tree
var final = TreeNode.TransformTree(tree, ApplyOperation); // Step 3: reduce the tree to a value
return final;
}
public static Package Arithmetic()
{
var _expression = Forward();
BNF
add_sub = OneOf('+', '-'),
mul_div = OneOf('*', '/'),
exp = '^';
BNF
number = FractionalDecimal(),
factor = number | ('(' > _expression > ')'),
power = factor > !(exp > factor),
term = power % mul_div,
expression = term % add_sub;
_expression.Is(expression);
add_sub.TagWith(Operation).PivotScope();
mul_div.TagWith(Operation).PivotScope();
exp .TagWith(Operation).PivotScope();
number .TagWith(Value);
return expression.WithOptions(Options.SkipWhitespace);
}
public const string Operation = "operation";
public const string Value = "value";
private static TreeNode ApplyOperation(TreeNode node)
{
if (node.Source.Tag is null) return node.Children[0]; // pull child up through joining nodes
if (node.Source.Tag != Operation) return node; // only look at operation nodes
var operation = node.Source.Value;
if (node.Children.Count < 2) throw new Exception("Invalid expression");
var left = node.Children[0].Source;
var right = node.Children[1].Source;
if (!double.TryParse(left.Value, out var a)
|| !double.TryParse(right.Value, out var b)) return node; // one of our children is not a number
// Both children are values: perform the operation
var result = operation switch
{
"+" => a + b,
"-" => a - b,
"*" => a * b,
"/" => a / b,
"^" => Math.Pow(a, b),
_ => throw new NotImplementedException($"Operation not implemented: '{operation}'")
};
// Return a new node with the calculated value
return TreeNode.FromString(result.ToString(CultureInfo.InvariantCulture), Value);
}
```
### Simplified XML Parser
```csharp
BNF // Fragments
text = Regex("[^<>]+"),
identifier = Regex("[_a-zA-Z][_a-zA-Z0-9]*"),
whitespace = Regex(@"\s+");
BNF // Literals
quoted_string = '"' > identifier > '"',
attribute = whitespace > identifier > '=' > quoted_string;
BNF // tags
tag_id = identifier.Tagged(TagId),
open_tag = '<' > tag_id > -attribute > '>',
close_tag = "" > tag_id > '>';
attribute.TagWith(Attribute);
text.TagWith(Text);
open_tag.TagWith(OpenTag).OpenScope();
close_tag.TagWith(CloseTag).CloseScope();
return Recursive(tree => -(open_tag > -(tree | text) > close_tag)).WithOptions(Options.None);
```
### Full spec JSON parser
From https://www.json.org/json-en.html
```csharp
var value = Forward();
BNF // Basic components
ws = AnyWhiteSpace,
number = FractionalDecimal(groupMark: "", decimalMark: ".", allowLeadingZero: false, allowLeadingPlus: false);
BNF // Strings
unicodeEsc = 'u' > CharacterInRanges(('0', '9'), ('a', 'f'), ('A', 'F')).Repeat(4),
escape = OneOf('"', '\\', '/', 'b', 'f', 'n', 'r', 't') | unicodeEsc,
character = NoneOf('"', '\\') | ('\\' > escape),
characters = -character,
quoted_string = '"' > characters > '"';
BNF // Elements (lone or in arrays)
element = ws > value > ws,
elements = element % ',';
BNF // Members of objects
member_key = quoted_string.Copy(),
member = ws > member_key > ws > ':' > element,
members = member % ',';
BNF // Objects
object_enter = '{',
object_leave = '}',
object_block = object_enter > (ws | members) > object_leave;
BNF // Arrays
array_enter = '[',
array_leave = ']',
array_block = array_enter > elements > array_leave;
BNF // Single values
primitive = quoted_string >= number >= "true" >= "false" >= "null";
value.Is(object_block >= array_block >= primitive);
array_enter.OpenScope().TagWith("array");
array_leave.CloseScope();
object_enter.OpenScope().TagWith("object");
object_leave.CloseScope();
member_key.TagWith("key");
primitive.TagWith("value");
return element.WithOptions(Options.None);
```