https://github.com/rotemdan/grammar-composer
Defines and generates parsers from composable grammar definitions. Includes advanced features like lexer-free parsing, selective packrat memoization and static analysis.
https://github.com/rotemdan/grammar-composer
context-free-grammar grammar lexer-free-parsing parser-generator parsing-expression-grammar peg
Last synced: 6 months ago
JSON representation
Defines and generates parsers from composable grammar definitions. Includes advanced features like lexer-free parsing, selective packrat memoization and static analysis.
- Host: GitHub
- URL: https://github.com/rotemdan/grammar-composer
- Owner: rotemdan
- License: mit
- Created: 2024-12-03T10:27:15.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-12-11T03:57:38.000Z (7 months ago)
- Last Synced: 2024-12-11T04:28:57.042Z (7 months ago)
- Topics: context-free-grammar, grammar, lexer-free-parsing, parser-generator, parsing-expression-grammar, peg
- Language: TypeScript
- Homepage:
- Size: 34.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Grammar composer
A library to define, build and efficiently parse context-free grammars.
* Grammars are defined using TypeScript class declarations
* No need for separate tokenization step. Tokenization is defined as part of the grammar via embedded `Pattern` objects that are internally processed through the [`regexp-composer`](https://github.com/rotemdan/regexp-composer) regular expression library
* The generated parser accepts raw characters as input, meaning it's a form of lexer-free, or hybrid parser, supporting contextual tokenization - that is, low-level character patterns can be specialized to different high-level parser contexts, and sub-patterns captured in the low-level regular expressions are directly embedded as part of the resulting parse tree
* Top-down parsing (roughly equivalent to PEG parsing), with optional "packrat" caching that can be enabled or disabled for individual productions
* Supports right-recursion, but will currently error when left-recursion is detected
* Uses sophisticated static analysis to automatically identify and annotate optional productions
* Provides useful parse-time error reporting, identifying the exact production involved and most likely alternatives at the failed position## Installation
```
npm install grammar-composer
```And also the related regular expression builder package:
```
npm install regexp-composer
```## Example: XML grammar
The grammar is defined within a container class `XmlGrammar`. It contains a mixture of higher-level, context-free productions and lower-level, regular expression productions.
* Context-free grammar productions are defined by anonymous functions `() => ...`
* Regular expression productions are defined by `pattern(...)`In this example, context-free operators are prefixed with `G`, and regular expression operators are prefixed with `R`, to avoid confusion between similarly named operators:
```ts
import * as G from 'grammar-composer'
import * as R from 'regexp-composer'export class XmlGrammar {
document = () => [
G.zeroOrMore(
G.anyOf(
this.textFragment,
this.openingTag,
this.closingTag,
this.comment,
this.declarationTag,
)
)
]textFragment = G.pattern([
R.oneOrMore(R.notAnyOfChars('<'))
])openingTag = () => [
this.openingTagStart,G.zeroOrMore(this.attribute),
this.tagEnd
]openingTagStart = G.pattern([
'<',R.possibly('?'),
R.captureAs('tagName',
R.oneOrMore(R.notAnyOfChars(R.whitespace, '"', "'", '?', '!', '/', '>'))
),R.zeroOrMore(R.whitespace),
])tagEnd = G.pattern([
R.zeroOrMore(R.whitespace),R.possibly(R.anyOf('/', '?')),
'>'
])attribute = G.pattern([
R.zeroOrMore(R.whitespace),R.captureAs('attributeName',
R.oneOrMore(R.notAnyOfChars(R.whitespace, '=', '"', "'", '?', '/', '>'))
),R.zeroOrMore(R.whitespace),
R.possibly([
'=',R.zeroOrMore(R.whitespace),
quotedString,
R.zeroOrMore(R.whitespace),
])
])closingTag = G.pattern([
'',R.zeroOrMore(R.whitespace),
R.captureAs('tagName',
R.oneOrMore(R.notAnyOfChars(R.whitespace, '/', '>'))
),R.zeroOrMore(R.whitespace),
'>'
])declarationTag = () => [
this.declarationTagOpening,G.zeroOrMore(this.declarationTagAttribute),
this.tagEnd
]declarationTagOpening = G.pattern([
''))
),R.zeroOrMore(R.whitespace)
])declarationTagAttribute = G.pattern([
R.zeroOrMore(R.whitespace),R.anyOf(
R.captureAs('attributeName',
R.oneOrMore(R.notAnyOfChars(R.whitespace, '"', "'", '/', '!', '?', '>'))
),quotedString,
),R.zeroOrMore(R.whitespace),
])comment = G.pattern([
''
])
}const quotedString = R.anyOf(
[
'"',
R.captureAs('doubleQuotedStringContent',
R.zeroOrMore(R.notAnyOfChars('"'))
),
'"'
],
[
"'",
R.captureAs('singleQuotedStringContent',
R.zeroOrMore(R.notAnyOfChars("'"))
),
"'"
],
)
```Building and parsing using the XML grammar:
```ts
import { buildGrammar } from 'grammar-composer'const xmlString = `
Adobe SVG Viewer
Open
Open New
Zoom In
Zoom Out
Quality
Pause
Mute
Find...
Find Again
Copy`
// Build the grammar. 'document' is the starting production.
//
// Although `XmlGrammar` is defined as a class, there's no need to instantiate it,
// just pass it as it is.
const grammar = buildGrammar(XmlGrammar, 'document')// Parse the XML string with the built grammar
const parseTree = grammar.parse(xmlString)
```The resulting parse tree looks like:
```ts
[
{
"name": "document",
"startOffset": 0,
"endOffset": 644,
"sourceText": "\n\n\n\n Adobe SVG Viewer\n Open\n Open New\n \n Zoom In\n Zoom Out\n \n Quality\n Pause\n Mute\n \n Find...\n Find Again\n Copy
\n\n\n",
"children": [
{
"name": "textFragment",
"startOffset": 0,
"endOffset": 1,
"sourceText": "\n",
"children": []
},
{
"name": "declarationTag",
"startOffset": 1,
"endOffset": 19,
"sourceText": "",
"children": [
{
"name": "declarationTagOpening",
"startOffset": 1,
"endOffset": 11,
"sourceText": "",
"children": []
}
]
},
{
"name": "textFragment",
"startOffset": 19,
"endOffset": 21,
"sourceText": "\n\n",
"children": []
},
{
"name": "openingTag",
"startOffset": 21,
"endOffset": 27,
"sourceText": "",
"children": [
{
"name": "openingTagStart",
"startOffset": 21,
"endOffset": 26,
"sourceText": "",
"children": []
}
]
},
{
"name": "textFragment",
"startOffset": 27,
"endOffset": 32,
"sourceText": "\n ",
"children": []
},
{
"name": "openingTag",
"startOffset": 32,
"endOffset": 40,
"sourceText": "",
"children": [
{
"name": "openingTagStart",
"startOffset": 32,
"endOffset": 39,
"sourceText": "",
"children": []
}
]
},...
```## Operators
Context-free operators are mostly named similarly to the ones in [`regexp-composer`](https://github.com/rotemdan/regexp-composer).
### `zeroOrMore(grammarElement)`
Match the grammar element zero or more times.
### `oneOrMore(grammarElement)`
Match the grammar element one or more times.
### `anyOf(grammarElement1, grammarElement2, grammarElement3, ...)`
Match any of the grammar elements. The first successful match, in order, would be accepted without trying subsequent ones.
### `bestOf(grammarElement1, grammarElement2, grammarElement3, ...)`
Match the best grammar element. All possibilities would be tried, and the the longest match (in terms of character count) would be chosen.
### `possibly(grammarElement)`
Optionally accept the grammar element, or skip if it doesn't match.
### `pattern(regexpPattern)`
Accept a regular expression pattern compatible with `regexp-composer` `Pattern` type (either a simple string, pattern object, or array of pattern objects).
### `cached(grammarElement)`
Store the result of parsing using this grammar element and reuse when it's subsequently evaluated **at the same text position**.
### `uncached(grammarElement)`
Don't cache this grammar element.
## Future
* Allow raw regular expressions as part of the grammar
* Allow to include user-provided parser functions## License
MIT