https://github.com/pyrocat101/opal
Self-contained monadic parser combinators for OCaml
https://github.com/pyrocat101/opal
parser-combinators parser-monad
Last synced: 8 months ago
JSON representation
Self-contained monadic parser combinators for OCaml
- Host: GitHub
- URL: https://github.com/pyrocat101/opal
- Owner: pyrocat101
- License: mit
- Created: 2015-01-17T01:36:47.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2023-07-24T22:23:49.000Z (over 2 years ago)
- Last Synced: 2025-02-27T19:34:57.175Z (8 months ago)
- Topics: parser-combinators, parser-monad
- Language: OCaml
- Size: 90.8 KB
- Stars: 147
- Watchers: 3
- Forks: 14
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-combinator-parsers - opal
README
# Opal: Monadic Parser Combinators for OCaml
Opal is a minimum collection of useful parsers and combinators (~150 loc of
OCaml) that makes writing parsers easier. It is designed to be small,
self-contained, pure-functional, and only includes most essential parsers, so
that one could include single file in the project or just embed it in other
OCaml source code files.
I find myself writing lots of recursive-descent parsers from scratch in OCaml
when I was solving Hackerrank FP challenges. That's why I wrote opal: to include
it in the source code and build parsers on top of parser combinators easily.
## Example
Trivial arithmetic calculator:
~~~ocaml
open Opal
let parens = between (exactly '(') (exactly ')')
let integer = many1 digit => implode % int_of_string
let add = exactly '+' >> return ( + )
let sub = exactly '-' >> return ( - )
let mul = exactly '*' >> return ( * )
let div = exactly '/' >> return ( / )
let rec expr input = chainl1 term (add <|> sub) input
and term input = chainl1 factor (mul <|> div) input
and factor input = (parens expr <|> integer) input
let () =
let input = LazyStream.of_channel stdin in
match parse expr input with
| Some ans -> Printf.printf "%d\n" ans
| None -> print_endline "ERROR!"
~~~
For non-trivial examples, see Hackerrank challenge solutions using opal in
`examples/`.
## Documentation
The expressiveness of parser combinators are attributed to higher-order
functions and the extensive use of currying. However, due to lack of `do`
syntax, the bind operation of monad would not be as succinct as that in Haskell.
A parser monad is either `None` (indicates failure), or `Some` pair of result
and unconsumed input, where the result is a user-defined value. The input is a
lazy stream of arbitrary token type. A parser is a function that accepts an
input and returns a parser monad. Although most parsers in opal is polymorphic
over token type and result type, some useful parsers only accepts `char` as
input token type.
Since combinators in opal are roughly based on Haskell's Parsec. The following
documentation is somehow a rip-off of Parsec's doc.
### Lazy Stream
**`type 'a LazyStream t`**
Polymorphic lazy stream type.
**`val LazyStream.of_stream : 'a Stream.t -> 'a LazyStream.t`**
Build a lazy stream from stream.
**`val LazyStream.of_function : (unit -> 'a) -> 'a LazyStream.t`**
Build a lazy stream from a function `f`. The elements in the stream is populated
by calling `f ()`.
**`val LazyStream.of_string : string -> char LazyStream.t`**
Build a char lazy stream from string.
**`val LazyStream.of_channel : in_channel -> char LazyStream.t`**
Build a char lazy stream from a input channel.
### Utilities
**`val implode : char list -> bytes`**
Implode character list into a string. Useful when used with `many`.
**`val explode : bytes -> char list`**
Explode a string into a character list.
**`val ( % ) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c`**
Infix operator for left-to-right function composition. `(f % g % h) x` is
equivalent to `h (g (f x))`.
**`val parse : ('token, 'result) parser -> 'token LazyStream -> 'result option`**
`parse parser input` parses `input` with `parser`, and returns `Some result` if
succeed, or `None` on failure.
### Primitives
**`type 'token input = 'token LazyStream.t`**
**`type ('token, 'result) monad = ('result * 'token input) option`**
**`type ('token, 'result) parser = 'token input -> ('token, 'result) monad`**
A parser is a function that accepts an input and returns either `None` on
failure, or `Some (result, input')`, where `result` is user-defined value and
`input'` is unconsumed input after parsing.
**`val return : 'result -> 'token input -> ('token, 'result) monad`**
Accepts a value and an input, and returns a monad.
**`val ( >>= ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser`**
`x >>= f` returns a new parser that if parser `x` succeeds, applies function `f`
on monad produced by `x`, and produces a new monad (a.k.a. `bind`).
**`val ( let* ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser`**
This operator is the same as `>>=` but using the `let` notation.
It is usefull to avoid ugly sequences of bindings. For exemple, `p >>= fun x -> f x` can
be rewritten `let* x = p in f x`. Combined with the `return` function, we can define complex parsers :
```ocaml
let tuple_parser =
let* x = digit in
let* _ = exactly ',' in
let* y = digit in
return (x, y)
```
**`val ( <|> ) : ('t, 'r) parser -> ('t, 'r) parser -> ('t, 'r) parser`**
Choice combinator. The parser `p <|> q` first applies `p`. If it succeeds, the
value of `p` is returned. If `p` fails, parser `q` is tried.
**`val mzero : 'a -> ('t, 'r) monad`**
A parser that always fails.
**`val any : ('t, 'r) parser`**
The parser succeeds for any token in the input. Consumes a token and returns it.
**`val satisfy : ('t -> bool) -> ('t, 'r) parser`**
The parser `satisfy test` succeeds for any token for which the supplied function
`test` returns `true`. Returns the token that is actually parsed.
**`val eof : 'a -> ('t, 'a) parser`**
The parser `eof x` succeeds if the input is exhausted. Returns value `x`.
### Derived
**`val ( => ) : ('t, 'r) parser -> ('r -> 'a) -> ('t, 'a) parser`**
Map combinator. `x => f` parses `x`. If it succeeds, returns the value of `x`
applied with `f`.
**`val ( >> ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'b) parser`**
Ignore-left combinator. `x >> y` parses `x` and then `y`. Returns the value
returned by `y`.
**`val ( << ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'a) parser`**
Ignore-right combinator. `x >> y` parses `x` and then `y`. Returns the value
returned by `x`.
**`val ( <~> ) : ('t, 'r) parser -> ('t, 'r list) parser -> 't input -> ('t, 'r list) monad`**
Cons combinator. `x <~> y` parses `x` and then `y`. Returns the value of `x`
prepended to the value of `y` (a list).
~~~ocaml
let ident = letter <~> many alpha_num
~~~
**`val choice : ('t, 'r) parser list -> ('t, 'r) parser`**
`choice ps` tries to apply the parsers in the list `ps` in order, until one of
them succeeds. Returns the value of the succeeding parser.
**`val count : int -> ('t, 'r) parser -> 't input -> ('t, 'r list) monad`**
`count n` parses `n` occurrences of `p`. If `n` is smaller or equal to zero, the
parser equals to `return []`. Returns a list of `n` values returned by `p`.
**`between : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'r) parser -> ('t, 'r) parser`**
`between open close p` parses `open`, followed by `p` and `close`. Returns the
value returned by `p`.
~~~ocaml
let braces = between (exactly '{') (exactly '}')
~~~
**`val option : 'r -> ('t, 'r) parser -> ('t, 'r) parser`**
`option default p` tries to apply parser `p`. If `p` fails, it returns the
value `default`, otherwise the value returned by `p`.
~~~ocaml
let priority = option 0 (digit => String.make 1 % int_of_string)
~~~
**`val optional : 'r -> ('t, 'r) parser -> ('t, unit) parser`**
`optional p` tries to apply parser `p`. It will parse `p` or nothing. It only
fails if `p` fails. Discard the result of `p`.
**`val skip_many : ('t, 'r) parser -> ('t, unit) parser`**
`skip_many p` applies `p` *zero or more* times, skipping its result.
~~~ocaml
let spaces = skip_many space
~~~
**`val skip_many1 : ('t, 'r) parser -> ('t, unit) parser`**
`skip_many1 p` applies `p` *one or more* times, skipping its result.
**`val many : ('t, 'r) parser -> 't input -> ('t, 'r list) monad`**
`many p` applies the parser `p` *zero or more* times. Returns a list of returned
values of `p`.
**`val many1 : ('t, 'r) parser -> 't input -> ('t, 'r list) monad`**
`many1 p` applies the parser `p` *one or more* times. Returns a list of returned
values of `p`.
**`val sep_by : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad`**
`sep_by p sep` parses *zero or more* occurrences of `p`, separated by `sep`.
Returns a list of values returned by `p`.
~~~ocaml
let comma_sep p = sep_by p (token ",")
~~~
**`val sep_by1 : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad`**
`sep_by1 p sep` parses *one or more* occurrences of `p`, separated by `sep`.
Returns a list of values returned by `p`.
**`val end_by: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser`**
`end_by p sep` parses *zero or more* ocurrences of `p`, separated and ended by
`sep`. Returns a list of values returned by `p`.
~~~ocaml
let statements = end_by statement (token ";")
~~~
**`val end_by1: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser`**
`end_by1 p sep` parses *one or more* ocurrences of `p`, separated and ended by
`sep`. Returns a list of values returned by `p`.
**`val chainl : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser`**
`chainl p op default` parses *zero or more* occurrences of `p`, separated by
`op`. Returns a value obtained by a *left* associative application of all
functions by `op` to the values returned by `p`. If there are zero occurences
of `p`, the value `default` is returned.
**`val chainl1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser`**
`chainl1 p op` parses *one or more* occurrences of `p`, separate by `op`.
Returns a value obtained by a *left* associative application of all functions
returned by `op` to the values returned by `p`. This parser can be used to
eliminate left recursion which typically occurs in expression grammars. See
the arithmetic caculator example above.
**`val chainr : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser`**
`chainr p op default` parses *zero or more* occurrences of `p`, separated by
`op`. Returns a value obtained by *right* associative application of all
functions returned by `op` to the values returned by `p`. If there are no
occurrences of `p`, the value `x` is returned.
**`val chainr1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser`**
`chainr p op` parses *one or more* occurrences of `p`, separated by `op`.
Returns a value obtained by *right* associative application of all functions
returned by `op` to the values returned by `p`.
### Singletons
**`val exactly : 'r -> ('t, 'r) parser`**
`exactly x` parses a single token `x`. Returns the parsed token (i.e. `x`).
~~~ocaml
let semi_colon = exactly ';'
~~~
**`val one_of : 'r list -> ('t, 'r) parser`**
`one_of xs` succeeds if the current token is in the supplied list of tokens
`xs`. Returns the parsed token.
~~~ocaml
let vowel = one_of ['a'; 'e'; 'i'; 'o'; 'u']
~~~
**`val none_of : 'r list -> ('t, 'r) parser`**
As the dual of `one_of`, `none_of xs` succeeds if the current token *not* in
the supplied list of tokens `xs`. Returns the parsed token.
~~~ocaml
let consonant = none_of ['a'; 'e'; 'i'; 'o'; 'u']
~~~
**`val range : 'r -> 'r -> ('t, 'r) parser`**
`range low high` succeeds if the current token is in the range between `low`
and `high` (inclusive). Returns the parsed token.
### Char Parsers
**`val space = (char, char) parser`**
Parses a white space character (`'\s\t\r\n'`). Returns the parsed character.
**`val spaces = (char, unit) parser`**
Skip *zero or more* white spaces characters.
**`val newline = (char, char) parser`**
Parses a newline character (`'\n'`). Returns a newline character.
**`val tab = (char, char) parser`**
Parses a tab character (`'\t'`). Returns a tab character.
**`val upper = (char, char) parser`**
Parses an upper case letter (a character between 'A' and 'Z'). Returns the
parsed character.
**`val lower = (char, char) parser`**
Parses a lower case letter (a character between 'a' and 'z'). Returns the parsed
character.
**`val digit : (char, char) parser`**
Parses a digit. Returns the parsed character.
**`val letter = (char, char) parser`**
Parses a letter (an upper case or lower case letter). Returns the parsed
character.
**`val alpha_num = (char, char) parser`**
Parses a letter or digit. Returns the parser character.
**`val hex_digit = (char, char) parser`**
Parses a hexadecimal digit (a digit or a letter between 'a' and 'f' or 'A' and
'F'). Returns the parsed character.
**`val oct_digit = (char, char) parser`**
Parses an octal digit (a character between '0' and '7'). Returns the parsed
character.
### Lex Helper
**`val lexeme : (char, 'r) parser -> (char, 'r) parser`**
`lexeme p` first applies `skip_many space` and then parser `p`. Returns the
value returned by `p`.
**`val token : string -> char input -> (char, char list) monad`**
`token s` skips leading white spaces and parses a sequence of characters given
by string `s`. Returns the parsed character sequence as a list.
~~~ocaml
div_or_mod = token "div" <|> token "mod"
~~~