https://github.com/mepy/nihil
A tiny but elegant parser combinator library written by Mepy
https://github.com/mepy/nihil
parser
Last synced: 10 months ago
JSON representation
A tiny but elegant parser combinator library written by Mepy
- Host: GitHub
- URL: https://github.com/mepy/nihil
- Owner: Mepy
- License: mit
- Created: 2021-04-24T04:05:28.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-02-06T16:54:19.000Z (almost 4 years ago)
- Last Synced: 2025-04-23T13:16:50.726Z (10 months ago)
- Topics: parser
- Language: JavaScript
- Homepage:
- Size: 33.2 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# nihil
## Info
Author = Mepy
[Github](https://github.com/Mepy) : Language = Engilish, mainly
[Blog](https://mepy.net/) : Language = Simplified Chinese(简体中文)
## Intro
WARNING : a TOY project current, do NOT use in production.
```nihil``` is a parser combinator library for Javascript(ES2015+).
nihil is null, nothingness, etc, while ```nihil``` is useful and elegant
[NPM](https://www.npmjs.com/package/nihil).
```nihil``` is a lot inspired by [bnb](https://github.com/wavebeem/bread-n-butter)
But written in the style of FP(Functional Programming), mainly, I thought
details in ## Reference.
```nihil``` has zero dependencies and is about 5.2KB(exactly 5291 Bytes) raw Javascript.
As for minifying and zipping, I haven't tried it.
```nihil``` source code uses some features of ES2015(or ES6),
such as destructuring assignment, arrow function.
So you'd better use modern browser like Chrome, Firefox, etc.
```nihil``` library contains only, maybe you can consider, one object ```nihil```.
So it of course supports both Browser and NodeJS.
```nihil``` is elegant and easy to use, try it? -> Read ## Tutorial
## Reference
```nihil``` is a lot inspired by [bnb](https://github.com/wavebeem/bread-n-butter)
At first, I used ```bnb``` for another toy project
However, I met some troubles :
1. Creating a recursive parser is a little hard
2. Assign bnb.parser.parse to a variable will throw an error
Because ```bnb``` implement parser with class
Assign instance.parse to a variable will lose ```this``` pointer
I decide to implement my own one, and I read the source code of ```bnb```
But ```nihil``` is totally new written, I thought it should be NOT a derivative of ```bnb```
The similarity between ```nihil``` and ```bnb``` might only be
```bnb.match(RegExp)``` <=> ```nihil(RegExp)```
and chain method ```.and```, ```.or```, etc.
## Tutorial
### ```nihil(RegExp)``` : Regular Expression Parser
You can use ```nihil(RegExp)``` to create parsers:
```js
nihil(/a+/) // a parser
```
### ```.try``` & ```.parse``` : Parsing Raw String
With a parser, you can parse raw string with method ```.try``` or ```.parse```:
```js
const a = nihil(/a+/)
a.try("aaaa") // "aaaa"
a.try("bbbb") // throw { expected: '/a+/', location: 0 }
a.parse("aaaa") // { right: true, value: 'aaaa' }
a.parse("bbbb") // { right: false, error: { expected: '/a+/', location: 0 } }
```
The difference between them is that ```.try``` would throw error while ```.parse``` would return.
For convenience, we will use ```.try``` in the following text.
### ```.and``` : Sequential Parser
You can use ```nihil.and(RegExp,...)``` to create a sequential parser:
```js
nihil.and(/a*/,/b*/,/c*/)
.try("bbbbcccc") // [ '', 'bbbb', 'cccc' ]
```
If exists a parser ```k```, you can use ```k.and(parser,...)```:
```js
const k = nihil(/k+/)
const g = nihil(/g+/)
k.and(g,k)
.try("kkkggggkkkk") // [ 'kkk', 'gggg', 'kkkk' ]
```
But It would trouble you if the second parser is a simple parser generated by ```nihil```
Well, you can simply enter RegExps in place of simple parsers:
```js
k.and(/g+/,k)
.try("kkkggggkkkk") // [ 'kkk', 'gggg', 'kkkk' ]
```
### ```.or``` : Choose Parsers
If you want to parse "a" or "b", You can of course use RegExp ```/a|b/```
But if you want to parse [one complex language] or [another complex language],
You can also do it with a complex RegExp which might bring errors,
but using ```nihil.or```, you can implement the same function as ```/a|b/```
with well-understanding code style:
```js
const a_b = nihil.or(/a/,/b/)
a_b.try("a") // [ 'a' ]
a_b.try("b") // [ 'b' ]
```
```.or``` is similar to ```.and```, so you can also do like these:
```js
const a = nihil(/a/)
const b = nihil(/b/)
const c = nihil(/c/)
a.or(b,c).try("c") // [ 'c' ]
a.or(b,/c/).try("c") // [ 'c' ]
```
ATTENTION : The order of parser makes difference, see ### Larger Precedes.
The combined parser will try to parse using from left parser to right parser.
```js
nihil.or(/a/,/a+/).parse("aaa") // { right: false, error: { expected: '', location: 1 } }
nihil.or(/a+/,/a/).parse("aaa") // { right: true, value: [ 'aaa' ] }
```
Well, maybe you are considering it is just another style to write RegExp
and doubting whether it is necessary to learn ```.or``` method.
What if the complex language is not a RE language but a LL(1) language?
You can not implement it with RegExp!!!
Read the following tutorial, you can create a parser of LL(m) .
(m as big as you want and if your computer can compute it in time.)
### ```.keep``` : Select Parsers
```.keep``` method is used for selecting parsers according former value.
With this, you can implement a LL(m) parser.
For example, maybe improper, we implement a LL(1) parser
which accepts a string like 'aA', 'bB', ..., 'zZ'
```js
const lowUp = nihil(/[a-z]/)
.keep($1=>nihil(RegExp($1.toUpperCase())))
.parse
lowUp("nN") // { right: true, value: [ 'n', 'N' ] }
lowUp("iJ") // { right: false, error: { expected: '/I/', location: 1 } }
```
ATTENTION : in ```.keep``` method, you must give a function
which returns a parser but **NOT** a RegExp,
different from ```.and``` and ```.or```.
As for LL(m), the format is like this, simply m = 2
```js
const char4 = nihil.and(/./,/./,/./,/./)
// result of char4 is an array with a length of 4 > m = 2
const parser = char4.keep(([$1,$2,$3,$4])=>{
if($4=='a'){return nihil(/r/)}
else if($3=='b'){return nihil(/s/)}
else{return nihil(/t/)}
}).try
// parser accepts /...ar/, /..b[^a]s/ or /..[^b][^a]t/
parser("wxyar"), // [ 'w', 'x', 'y', 'a', 'r' ]
parser("wxbzs"), // [ 'w', 'x', 'b', 'z', 's' ]
parser("wxyzt"), // [ 'w', 'x', 'y', 'z', 't' ]
```
REASON : Why name this method after 'keep'?
Because this method **KEEP** the former value (e.g. result of ```char4```)
Can we choose not to keep it?
Of course, use ```.drop``` instead of ```.keep```
But it is little used in my view.
### ```.map``` : Transform Value
Until now, value of result of parsers are all strings.
Sometimes, we need to transform them, e.g. transform ```'3'``` to ```3```
```js
nihil(/0|[1-9][0-9]*/) // String of number, decimal
.map(Number) // from String to Number
.try("3") // 3
```
```.map``` could do other things, like verifying the value:
For example, we decide to reject 114514
```js
const num = nihil(/0|[1-9][0-9]*/)
.map(Number)
.map($1=>{
if($1==114514)
return undefined
else
return $1
})
num.and(/\s+/,num)
.try("114514 3") // [ undefined, ' ', 3 ]
```
However, you might realize that the parser,
in fact, replace 114514 with undefined.
Can we DROP 114514, make the result ```[ ' ', 3 ]```?
Use ```.drop```.
### ```.drop``` : Intercept Value (Optional)
Except that ```.drop``` drops the former parser's result,
it is nearly the same as ```.keep```
What needs to be emphasized is
that ```fn``` in ```.drop(fn)``` MUST return a parser
NEITHER RegExp, NOR array of values in ```.map```
```nihil.box(value)``` will return a parser
which does nothing but return value,
so called a box (a parser containing value).
You might feels it useful, right?
```js
const num = nihil(/0|[1-9][0-9]*/)
.map(Number)
.drop($1=>{
if($1==114514)
return nihil
else
return nihil.box($1)
})
const foo = num.and(/\s+/,num).try
foo("114514 3") // [ ' ', 3]
foo("10492 3") // [ 10492, ' ', 3 ]
```
But wait, what is ```nihil``` in ```return nihil```?
### ```nihil``` : NULL Parser
As is seen, ```nihil``` acts as a **parser**, doing nothing.
Formly, ```nihil```, as its name, is a nihil (NULL parser, PSEUDO parser).
It can accept a RegExp and return a parser, which we have been using.
```nihil``` is a parser, so you can use its ```.and```, ```.or```.
But ```nihil```'s ```.keep``` , ```.drop``` and ```.map``` must accept a parser first,
because it doesn't return value when parsing. See API.
Until now, we have nearly been empowered to write LL(∞) parsers.
But without knowing the flaw of parser combinator, i.e. the top down parser,
The following methods would be hard to make by the above methods.
## Higher Order Tutorial
For a quick-look, You can only read built-in method.
But reading completely is useful for understanding ```nihil```'s feature.
### ```.sep``` : Seperator Between Parser
Recall the result of example in ```.drop``` tutorial, ```[ ' ' , 3 ]```
We might find that the ' ' is so ugly, we want to drop it!
Of course, we can use ```.map``` to map the array ```[ ' ' , 3 ]``` to ```[ 3 ]```.
But the annoying thing is we need to consider more situations :
```js
const number = nihil(/0|[1-9][0-9]*/).map(Number)
const space = nihil(/\s*/) // accept 0~∞ space(s)
const left = " 3"
const right = "3 "
const leftright = " 3 "
```
#### using built-in ```.sep```
```nihil``` implements a built-in method ```.sep``` for parsers:
```js
var parse = number.sep(space).try
parse(left) // 3
parse(right) // 3
parse(leftright) // 3
```
But it hides some interesting features, let's implement our ```.sep```.
#### using simple ```.and```
```js
// first method
var parse = nihil.and(space, number, space).parse
parse(left) // { right: false, error: { expected: '/\\s*/', location: 3 } }
parse(right) // { right: true, value: [ 3, ' ' ] }
parse(leftright) // { right: true, value: [ ' ', 3, ' ' ] } }
```
Obviously, sometimes it would err.
#### using simple ```.and``` and partial ```.or(nihil)```
```js
// first method
var parse = nihil.and(space, number, space.or(nihil)).parse
parse(left) // { right: true, value: [ ' ', 3 ] }
parse(right) // { right: true, value: [ 3, ' ' ] }
parse(leftright) // { right: true, value: [ ' ', 3, ' ' ] } }
```
We have solved the problem it errs, but how to map three different cases to the same result ```3```?
It is a bit difficult, especially differentiating the first and the second.
#### using ```.drop``` ahead of combining parsers
When meeting \s*, we drop it.
```js
var space_drop = space.drop(_=>nihil);
var parse = nihil.and(space_drop, number, space_drop.or(nihil))
.map($=>$[0])
.parse
parse(left) // { right: true, value: 3 }
parse(right) // { right: true, value: 3 }
parse(leftright) // { right: true, value: 3 }
```
Finally, we solve it.
#### feature
- ```parser.drop(_=>nihil)``` can drop the value of the parser
- ```parser.or(nihil)``` can skip the parser if it couldn't match, similar to the symbol ```?``` in the RegExp.
### ```.loop``` : Recursive Parser
If we want to parse ```" 3 4 5 "```,
we can use ```number.and(number, number)``` with ```.sep```.
But if we don't know how many numbers occur?
```js
const array4 = "3 4 6 4 "
const array2 = " 5 9"
```
#### using built-in ```.loop```
Of course, there is a built-in ```.loop```:
```js
var array = number.sep(space)
.loop()
array.try(array4) // [ 3, 4, 6, 4 ]
array.try(array2) // [ 5, 9 ]
```
but it also hides some interesting features, let's implement our ```.loop```.
#### loop = recursion
As programmers all know, loop is equal to recusion.
So we can implement a recursive parser, the following is a BNF(Backus-Naur Form):
```BNF
array ::= number array | nihil
```
But how can ```array``` call itself?
#### straightly calling itself
```js
var array = undefined // ensure array undefined
var array = number.sep(space)
.and(array)
.or(nihil)
// Thrown:
// TypeError: Cannot read property 'raw' of undefined
```
It failed, because array is undefined when we define array.
#### using wrapper and closure
In Javascript, closure is useful.
In this way, an object can be captured by a function.
We can use the object in the function when called.
In other words, we wrap the object with a function.
We usually call this trick "late-bound" (especially in λ-calculus)
```js
var array = undefined // ensure array undefined
var array = number.sep(space)
.and(_=>array(_))
.or(nihil)
array.try(array4) // [ 3, [ 4, [ 6, [ 4 ] ] ] ]
array.try(array2) // [ 5, [ 9 ] ]
```
ATTENTION : We defer the use time of ```array```!
It was called when the arrow function ```_=>array(_)``` called.
Wait, is parser a function, now that called with argument ```_```?
Yes, a parser is a function, it accepts ```source``` and return ```value```.
We use ```.try``` or ```.parse```,
because ```source``` is a wrapper of ```String```,
and we also need to wrap ```value```s into results or throw errors.
See [nihil.parser](###nihil.parser)
#### unfolding the result
Noticing that the result is a recursive array,
we need to unfold it into a simple array.
```js
const unfold = a=>(a.length == 1)?a:([a[0],...a[1]]) // a means array
var array = undefined // ensure array undefined
var array = number.sep(space)
.and(_=>array(_))
.map(unfold)
.or(nihil)
array.try(array4) // [ 3, 4, 6, 4 ]
array.try(array2) // [ 5, 9 ]
```
#### feature
- ```parser``` is indeed a function, we can call it with [source](###nihil.source)
- ```_=>parser(_)``` can defer the time of using parser.
- ```unfold``` is needed for maping recursive array to simple array.
### ```.wrap``` : Wrapper Between Parser
Now, we could parse string such as ```"3 4 6 4 "```, ```" 5 9"```, etc.
We want to add ```"["``` and ```"]"``` on the left and right of array.
#### using built-in ```.wrap```
```js
array
.wrap(/\[/,/\]/)
.try("[3 4 6 4 ]") // [ 3, 4, 6, 4 ]
```
#### using ```.and``` and ```.map```
```js
nihil
.and(/\[/,array,/\]/)
.map(([l,v,r])=>v)
.try("[3 4 6 4 ]") // [ 3, 4, 6, 4 ]
```
```nihil``` indeed uses such a method to offer a handy tool ```.wrap```.
#### feature
- ```parser.wrap``` is a useful tool.
### Larger Precedes
Let's run 2 code segments with a slight difference:
```js
var number = nihil(/0|[1-9][0-9]*/).map(Number).sep(/\s*/)
number.loop().or(number)
.parse("3 4") // { right: true, value: [ 3, 4 ] }
```
```js
var number = nihil(/0|[1-9][0-9]*/).map(Number).sep(/\s*/)
number.or(number.loop())
.parse("3 4") // { right: false, error: { expected: '', location: 2 } }
```
The second one is incorrect.
The ```number``` parsed the string first and **succueeded**,
so ```number.loop()``` had no opportunity to parse the raw string.
However, ```number``` could only parse one number,
so it expected \ after parsing ```"3 "```, but given ```"4"```,
finally it returned an error.
Notation :
- P, Q, R, ... are Parsers.
- P | Q means ```P.or(Q)```
- L(P) is the language described by P.
- p ∈ L(P) is a string of the language L(P).
- p < q means p is a prefix substring of q.
Definition : P < Q iff L(P) < L(Q) iff ∀q ∈ L(Q), ∃p ∈ L(P), p < q
Theorem : P < Q ⇒ L(P | Q) = L(P)
Proof : Omitted.
Therefore, if P < Q, ```P.or(Q)``` acts as ```P```.
#### feature
- when ```.or``` combining parsers, the larger should precedes.
### Avoid Left Recursion
Recall the BNF in the loop recursion
```BNF
array ::= number array | nihil
```
As a matter of fact, the following BNF is also correct.
```BNF
array ::= array number | nihil
```
However, the corresponding Javascript code is incorrect:
```js
var number = nihil(/0|[1-9][0-9]*/).map(Number).sep(/\s*/)
var array = nihil.and(_=>array(_),number).or(nihil)
array.try("3 4 ")
// Thrown:
// RangeError: Maximum call stack size exceeded
```
This is because ```array``` always calling itself,
making the function calling stack overflow.
We call the first BNF "right recursion", the second "left recursion".
#### feature
- when ```.and``` creating recursive parsers, transform left recursion to right recursion.
### Example : Tree Recursion
The recursive parser made in the ```.loop``` is linked-list-like.
Now we implement a tree-like one.
```js
var number = nihil(/0|[1-9][0-9]*/).map(Number)
var array = number.or(_=>array(_)).sep(/\s*/).loop().wrap(/\[/,/\]/)
array.try("[ 3 [5[9]] 6 [4 5 ] ]") // [ 3, [ 5, [ 9 ] ], 6, [ 4, 5 ] ]
```
## Useful Tools
### ```.lable``` : Label Node
Sometimes we want to add a label to the parser result.
```js
var decimal = nihil(/0|[1-9][0-9]*/).map(Number).label("decimal")
var hexadecimal = nihil(/0x(0|[0-9a-fA-F][1-9a-fA-F]*)/).map(Number).label("hexadecimal")
var number = decimal.or(hexadecimal)
number.try("33") // { label: 'decimal', value: 33 }
```
In fact, you can use ```.map``` to implement it.
```js
parser.label = label=>parser.map(value=>({label,value}))
```
### ```.locate``` : Located Parser
Sometimes we want to locate where parser works.
```js
var a = nihil(/a+/)
var b = nihil(/b+/).locate()
a.or(b).loop()
.try("aaabbbbaa") // [ 'aaa', { beg: 3, value: 'bbbb', end: 7 }, 'aa' ]
```
## Current Bugs
```js
var a = nihil(/a+/).drop(_=>nihil)
var b = nihil(/b+/)
b.or(a).sep(/\s+/).loop().parse("aaabbbbaa")
// { right: false, error: { expected: '', location: 3 } }
```
The reason might be that ```.loop``` uses ```nihil.nihil``` to break the looping, while ```.drop``` offers ```nihil``` generating ```nihil.nihil```.
## API
### ```nihil```
nihil(parser A)=>parser A
nihil(source)=>nihil.nihil
nihil(RegExp)=>parser(RE)
### result format
- ```{right:true, value : ... }``` is the result when parsing succeeded.
- ```{right:false, error : ... }``` is the result when parsing failed.
- ```nihil.nihil = { right:true, nihil:true }``` is the result of the **nihil** parser.
### nihil.parser
It is the helper function of nihil
If you are a new hand of Javascript but familiar with OOP(Object Oriented Programming),
you can consider it as ```class parser```
As a matter of fact, its real member function are ```.try``` and ```parse```
for easily parse string and return proper result.
Because a parser is a function indeed, you can call it with argument ```source```,
see [nihil.source](###nihil.source)
As for other functions, they are all for convenience of creating parsers
in the grammar of chain calling.
You can intuitively feels it in Tutorial.
Also, the source code of ```nihil.parser```
promotes your understanding of the whole ```nihil```.
### nihil.source
Label the raw string with an ima(a Japanese word, means "current") location.
```js
const source = nihil.source("abc") // source = { raw: "abc", ima: 0 }
const ab = nihil(/ab/)
ab(source) // { right: true, value: "ab" }
// source = { raw: "abc", ima: 2 }
```
### nihil.location
A parser returning the ima location of parsing source as value.
```js
const source = nihil.source("abc") // source = { raw: "3", ima: 0 }
const ab = nihil(/ab/)
ab(source)
nihil.location(source) // {right: true, value: 2}
// source = { raw: "abc", ima: 2 }
```
### nihil.and
input array of RegExp or parser, return a parser
which parses source sequentially in the order of array
### nihil.or
input array of RegExp or parser, return a parser
which tries to parse source from array[0] to array[array.length-1]:=end.
If array[k] succeeds, array[k+1],...,end wouldn't be used to parse source.
### nihil.keep
It helps implement the parser of LL(∞).
```nihil.keep``` accepts ```parser``` and
selecting function ```fn```
with value return by a ```parser``` as argument
and a parser as return value, and returns a new parser.
```js
var fn = value=>parser
var LLm = nihil.keep(parser)(fn)
```
### nihil.drop
It is similar to ```nihil.keep``` with difference
that it would DROP the values parsed out by ```parse```
### nihil.map
Used to map the value of ```parse``` to the format you like
```js
nihil.map(parse)(mapping)
```
### nihil.sep
Used to deal with the ```seperator``` on the left and the right of ```parser```.
```js
nihil.sep(parser)(seperator)
```
### nihil.loop
Used to looping parse string with ```parser```.
```js
nihil.loop(parser)
```
### nihil.label
Used to label the value of ```parser```.
```js
nihil.label(parser)(label)
```
### nihil.locate
Used to locate the value of ```parser```
```js
nihil.locate(parser)
```