https://github.com/neu-rah/paco
JavaScript monadic parser combinators
https://github.com/neu-rah/paco
combinators functional-js grammar meta-parser monadic parser
Last synced: 4 months ago
JSON representation
JavaScript monadic parser combinators
- Host: GitHub
- URL: https://github.com/neu-rah/paco
- Owner: neu-rah
- Created: 2020-11-26T04:35:38.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-07-11T18:13:42.000Z (12 months ago)
- Last Synced: 2025-10-28T22:33:29.242Z (8 months ago)
- Topics: combinators, functional-js, grammar, meta-parser, monadic, parser
- Language: JavaScript
- Homepage:
- Size: 200 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# PaCo
**javascript monadic parser combinators**
-----------------------------------------
This is a tool for building parsers and parse, so that you do not have to be a parser expert to do it.
```javascript
const myParser=
skip(char('#'))
.then(many(letter).join())
.skip(char('-'))
.then(digits.join().as(parseInt))
parse(">")(myParser)("#AN-123")
```
outputs:
```javascript
Right { value: [ 'AN', 123 ] }
```
All parsers can chain up or group to form other parsers that still can chain up and group.
Some available metaparsers like `many()`, `some()`, `skip()` can accept other parsers or metaparsers.
Some parsers are already a composition with metaparsers, that is the case of `digits`, it will perform `many(digit)`.
**abbreviations**
a single string can now be used in place of a non-chaining parser and it will translate either to a `char` or `string` parser.
> `digits.then('.')` is valid as `digits.then(char('.'))`
## Building objects
`.to(tag)` extender will grab the current parsing group result and store it on object key `tag`
if an object does not exist yet it is created, if there is already an object on the results tail it will be used.
```javascript
const kchk=
string("temp: ")
.then(
option("",oneOf("-+"))
.then(digits)
.join().as(parseInt).to("temp")
.then(char('K').to("unit"))
.verify(o=>o[0].unit==='K'&&o[0].temp>=0)
.failMsg("positive Kelvin!")
)
```
```javascript
#>res(">")(kchk.parse("temp: 12K")).value
[ 'temp: ', { temp: 12, unit: 'K' } ]
```
or failing:
```javascript
#>res(">")(kchk.parse("temp: -12K")).value
'>error, expecting positive Kelvin! but found `-` here->-12K...'
```
this, along `.verify`, `.post` and `.as` allow event callbacks and all sort of automation during the parsing, if not then let me know.
**It's now possible to parse this:** _enable/disable by config_
Enable with `config.backtrackExclusions=true`
```javascript
#>config.optimize=true//turn on optimizations on construction
#>config.backtrackExclusions=true//track exclusions on optimization
#>digits.join().as(parseInt).then(count(2,digit).join()).parse("12345")
Right { value: Pair { a: [ 123, '45' ], b: '' } }
```
`.then`, `.skip` and others can inject exclusion checks on the chain at construction time.
We allow the parser base to be re-writen at construction time, keeping away all checking from parse time.
`many` will peek this injected parameters and possibly exclude them from the sequence match
> one can still call `.optim` even with optimizations turned off
however backtrack will still respect its flag
>optimization chain is not very populated yet, there are many things to fit in...
## Config
**module exported variable **
__now on by default (>=1.2)__
```javascript
var config={
optimize:false,//all optimizations
backtrackExclusions: false//exclude next selector root from current loop match
}
```
- **optimize** disable all optimizations when false
- **backtrackExclusions** exclude next parser root from the current selection
> backtrack can be dismissed for well writen parsers
(there is still a ling way to go here)
---
## .then | .skip
The chaining is done with `.then` or `.skip`, the first combines the output, while the second will drop it.
## .else
Provide an alternative value on parser fail, can modify any parser to have a default and do not fail.
## .optional
Make the target parser optional, silently fail.
## .or
Parsers can alternate with `.or`
## .seq(separator,terminator,min,max)
Take the target parser and collect a sequence of it, with optional separator and/or terminator. Also checks min and max (0->Infinity by default)
>`digit.seq()` => `many(digit)`
>`digit.seq(null,null,1)` => `some(digit)`
## .qt(min,max)
A shortcut to .seq, to parse a specific quantity of target parser
## .notFollowedBy(p)
parser succeeds only if `p` fails
## .lookAhead(p)
predicated `p` with no consume before parsing, if `p` fails the parsing will fail
## .excluding(p)
predicated `p` before parsing, if `p` succeedes the parsing will fail
**must apply to same level parser**. using `.excluding(char(..))` at character level on a string level parser will have no effect
```javascript
digits.excluding(oneOf("89"))//this will have no effect
many(digit.excluding(oneOf("89")))//but this will
```
> if optimizatumizing with exclusion back-track, the the first will have effect
as PaCo will re-write the base to be exactly the second
## .as
Parse output can be formated with `.as`, it will apply to the parser or group where inserted. `.as` will accept an output transformer function.
Output transformations can stack up.
## .join
`.join()` and `.join(«sep»)` are shortcuts for `.as(mappend)` and `.as(o=>o.join(«sep»))`
Parsers can group by nesting ex: `x.then( y.then(z).join() )`, here the `join` will only apply to the (y.z) results.
TODO: this (grouping) is not fully generalized yet
## .group
Put the result inside a list (same as `as(o=>[o])`)
## .verify
`.verify(func,msg)` function `func` will receive the parse group result (list) and should return `true` if approved or `false` to resume in error with message `msg`.
## .post
`.post(f)` post-processing the result, this is still a static parser definition. Function `f` return will replace the previous result.
## .failMsg
`.onFailMsg(msg)` provides a message for a failing parser
## .parse
`.parse("...")` can be used to quick feed a string to any parser.
The result will include both input and output state.
>ex: `digits.parse("123a")`
_use `parse` function to get only output_
all transformation definitions should be applyed to the parser and not to the result, so `.parse` should be the last item of the group.
a parser can be stored, combined, passed around and perform parsing on many contents many times, all transitory state is kept outside.
### -- failing --
this parse will fail as it expects at least one digit
```javascript
#>parse(">")(some(digit))("#123")
Left { value: 'error, expecting digit but found `#` here->#123' }
```
## Composition examples
```javascript
parse(">")(
many(
some(digit.or(letter)).join()
.skip(spaces)
).join("-")
)("As armas e os baroes")
```
expected result
```javascript
Right { value: [ 'As-armas-e-os-baroes' ] }
```
```javascript
const nr=
skip(spaces)
.then(digits).join().as(parseInt)//get first digits as number
.then(many(//then seek many separated by `,` or '|'
skip(spaces)
.skip(char(',').or(char('|')))//drop the separators (not included in output)
.skip(spaces)
.then(digits.join().as(parseInt))
)).as(foldr1(a=>b=>a+b))//transform output by adding all values
parse(">")(nr)(" 12 , 2 | 1")
```
expected result
```javascript
Right { value: [ 15 ] }
```
_above parser could be writen using `sepBy`, we were just emphasizing the combinatory_
## Parsers
- **satisfy(f)** uses a function `char->bool` to evaluate a character
- **char(c)** matches charater `c`
- **cases(c)** case insensitive character `c` match
- **oneOf("...")** matches any given string character
- **noneOf("...")** matches any character not included in string
- **range(a,z)** matches characters between the given ones (inclusive)
- **digit** any digit `0-9`
- **lower** lower case letters `a-z`
- **upper** upper case letters `A-Z`
- **letter** any letter `a-z` or `A-Z`
- **alphaNum** letter or digit
- **hexDigit** hexadecimal digit
- **octDigit** octal digit
- **space** single space
- **tab** single tab
- **nl** newline
- **cr** carriage return
- **blank** tab or space
- **spaces** optional many space
- **blanks** optional many white space
- ~~**spaces1**~~ one or more spaces -> use `some(space)`
- ~~**blanks1**~~ one or more white spaces -> use `some(blank)`
- **digits** optional many digits
- **eof** end of file
- **string("...")** match with given string
- **cis("...")** non case-sensitive string match
- **regex(expr)** match with regex expression
```javascript
#>parse(">")(regex("#([a-zA-Z]+)[ -]([0-9]+)"))("#an-123...")
Right { value: [ 'an', '123' ] }
```
- **skip(...)** ignore the group/parser output
- **many(p)** optional many ocourences or parser `p` targets. This parser never fails as it can return an empty list.
- **some(p)** one or more ocourences of parser `p` targets
- **manyTill(p,end)** one or more ocourences of parser `p` terminating with parser `end`
- **optional(p)** parse `p` if present, otherwise ignore and continue parsing
- **choice[ps]** parse from a list of alternative parsers, this is just an abbreviation of `.or` sequence.
- **count(n)(p)** parses `n` ocourences of `p`
- **between(open)(close)(p)** parses `p` surounded by `open` and `close`, dropping the delimiters.
Be sure to exclude the delimiters from the content or provide any other meaning of content end
```javascript
#>parse(">")(between(space,space,some(noneOf(" "))).join())(" ab.12 ")
Right { value: [ 'ab.12' ] }
```
- **option(x)(p)** parses `p` or returns `x` if it fails, this parser never fails.
```javascript
#>parse(">")(option(["0"])(digit))("1")
Right { value: [ '1' ] }
#>parse(">")(option(["0"])(digit))("")
Right { value: [ '0' ] }
#>parse(">")(option(["0"])(digit))("#")
Right { value: [ '0' ] }
```
- **optionMaybe(p)** parse `p` and returns `Just` the result or `Nothing` if it fails, this parser never fails
- **sepBy(p)(sep)** parses zero or more ocourences of `p` separated by `sep` and droping the separators, this parser never fails.
- **sepBy1(p)(sep)** parses one or more ocourences of `p` separated by `sep` and droping the separators, this parser never fails.
- **endBy(p)(sep)(end)** parses zero or more ocourences of `p` separated by `sep` droping the separators and terminating with `end`
- **endBy1(p)(sep)(end)** parses one or more ocourences of `p` separated by `sep` droping the separators and terminating with `end`
- **none** non-consume happy parser.
> none is an identity parser, will just output the given input as a successful parse. So it never fails or consumes.
We use it to turn binary combinators into unary metaparsers. That is the case of `.skip(...)`, it uses the `none` parser to be available as a unary modifier `skip()`.
`none` can do so for any binary combinator and can apear where you want to disable a part.
> using `none` as `sep` with `endBy(p,sep,end)` whill silentrly skip the `sep` need.
## try and consume
Untill now, all failing parsers do not consume... lets see... while so, no need to inplement **try*
> to be more accurate, failing parsers do consume, we need the failing point on the reports, however the upper parser might pick the starting point to move on, ignoring the consume (as **try** do).
## Parsers basic IO
For now parsers accept a state pair of (input,output) and will return `Either`:
- on error: a pair of an error and the input state.
- on success: a pair of parsed content and the input state.
_*expect changes on this arguments format (changed on v1.1)_
testing a simple parser
```javascript
#>digits.run(Pair([],"123"))
Right { value: Pair { a: [ '1', '2', '3' ], b: '' } }
```
This is the basic form of parsing (feeding a parser).
However a `parse` function is available, it will perform as the former but gives only output state or a fancy error message.
```javascript
#>parse(">")(digits)("123")
Right { value: [ '1', '2', '3' ] }
```
Same with
```javascript
#>digits.parse("123")
Right { value: Pair { a: [ '1', '2', '3' ], b: '' } }
```
the only difference is that this last one, as the first will give full output, including the input state.
## utility
### **parse**
`parse(filename)(parser)(input string or stream)`
the filename is merelly a decoration here, to be used on error report
```javascript
#>parse(">")(letter.or(digit))("1")
Right { value: [ '1' ] }
#>parse(">")(letter.or(digit))("a")
Right { value: [ 'a' ] }
#>parse(">")(letter.or(digit))("#123")
Left {
value: 'error, expecting letter or digit but found `#` here->#123' }
```
direct parse
```javascript
#>letter.or(digit).parse("1")
Right { value: Pair { a: '', b: [ '1' ] } }
#>letter.or(digit).parse("a")
Right { value: Pair { a: '', b: [ 'a' ] } }
#>letter.or(digit).parse("#123")
Left { value: Pair { a: '#123', b: 'letter or digit' } }
```
desugared parse
```javascript
#>letter.or(digit).run(Pair("1",[]))
Right { value: Pair { a: '', b: [ '1' ] } }
#>letter.or(digit).run(Pair("a",[]))
Right { value: Pair { a: '', b: [ 'a' ] } }
#>letter.or(digit).run(Pair("#123",[]))
Left { value: Pair { a: '#123', b: 'letter or digit' } }
```
### **res(r)**
process a parser return to produce a result or error message, discarding input state description.
```javascript
#>res(">")(letter.then(digits).parse("123"))
Left { value: '>error, expecting letter but found `1` here->1...' }
```
without `res()` procesing
```javascript
#>letter.then(digits).parse("123")
Left { value: Pair { a: '123', b: 'letter' } }
```
### **.expect**
as a consequence of the error report system we got a parser description for free, no great effort was put to it thou
```javascript
const p=
optional(skip(char('#')))
.then(some(letter).join())
.skip(char('-').or(spaces1))
.then(digits.join().as(parseInt))
```
description:
```javascript
#>console.log(p.expect)
```
```text
optional skip character `#`
then (at least one letter)->join()
skip character `-` or at least one space
then ((digits)->join())->as(parseInt)
```
using:
```javascript
#>console.log(parse(">")(p)("#AN-123"))
Right { value: [ 'AN', 123 ] }
```
## Chronology
### 1.2
**added:**
- `.seq(...)` super parser
- `.qt(min[,max])` parser modifyer (quantification), based on .seq()
- `.group()` as o=>[o]
- `.else()` provide alternativfe values for failing parsers (they will never fail then)
- activated optimizations and backtrack analitics
### 1.1
Using character domain analysis to detect parser overlap
```text
[0-9] ∩ ([0-9] ∪ [a-z])
<=> ([0-9] ∩ [0-9]) ∪ ([0-9] ∩ [a-z])
<=> ((∅)) ∪ (([0-9]))
<=> [0-9] ∪ ∅
<=> [0-9]
```
version 1.1 is a full re-write with focus on speed
- output pair content swaped
- `many1` replaced by `some`
- `onFailMsg` replaced by `failMsg`
- Parsers are no longuer functions (they are classes and do not derive from Function anymore) so they must be called with `.run` instead of direct function call.
### < 1.1
some experiments with composition and parser analysis, coding was easy with no performance care.
_this parser is inspired but not following "parsec"_