Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wimvanderbauwhede/Perl-Parser-Combinators
Parsec-style parser combinator library in Perl
https://github.com/wimvanderbauwhede/Perl-Parser-Combinators
Last synced: 3 months ago
JSON representation
Parsec-style parser combinator library in Perl
- Host: GitHub
- URL: https://github.com/wimvanderbauwhede/Perl-Parser-Combinators
- Owner: wimvanderbauwhede
- License: other
- Created: 2013-08-23T14:01:07.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2015-09-29T13:42:14.000Z (over 9 years ago)
- Last Synced: 2024-07-31T21:54:50.043Z (6 months ago)
- Language: Perl
- Size: 153 KB
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: Changes
- License: LICENSE
Awesome Lists containing this project
- awesome-combinator-parsers - Perl-Parser-Combinators
README
# NAME
Parser::Combinators - A library of building blocks for parsing text
# SYNOPSIS
use Parser::Combinators;
my $parser = < a combination of the parser building blocks from Parser::Combinators >
(my $status, my $rest, my $matches) = $parser->($str);
my $parse_tree = getParseTree($matches);# DESCRIPTION
Parser::Combinators is a library of parser building blocks ('parser combinators'), inspired by the Parsec parser combinator library in Haskell
(http://legacy.cs.uu.nl/daan/download/parsec/parsec.html).
The idea is that you build a parsers not by specifying a grammar (as in yacc/lex or Parse::RecDescent), but by combining a set of small parsers that parse
well-defined items.## Usage
Each parser in this library , e.g. `word` or `symbol`, is a function that returns a function (actually, a closure) that parses a string. You can combine these parsers by using special
parsers like `sequence` and `choice`. For example, a JavaScript variable declarationvar res = 42;
could be parsed as:
my $p =
sequence [
symbol('var'),
word,
symbol('='),
natural,
semi
]if you want to express that the assignment is optional, i.e. ` var res;` is also valid, you can use `maybe()`:
my $p =
sequence [
symbol('var'),
word,
maybe(
sequence [
symbol('='),
natural
]
),
semi
]If you want to parse alternatives you can use `choice()`. For example, to express that either of the next two lines are valid:
42
return(42)you can write
my $p = choice( number, sequence [ symbol('return'), parens( number ) ] )
This example also illustrates the \`parens()\` parser to parse anything enclosed in parenthesis
## Provided Parsers
The library is not complete in the sense that not all Parsec combinators have been implemented. Currently, it contains:
whiteSpace : parses any white space, always returns success.
* Lexeme parsers (they remove trailing whitespace):
word : (\w+)
natural : (\d+)
symbol : parses a given symbol, e.g. symbol('int')
comma : parses a comma
semi : parses a semicolon
char : parses a given character* Combinators:
sequence( [ $parser1, $parser2, ... ], $optional_sub_ref )
choice( $parser1, $parser2, ...) : tries the specified parsers in order
try : normally, the parser consums matching input. try() stops a parser from consuming the string
maybe : is like try() but always reports success
parens( $parser ) : parser '(', then applies $parser, then ')'
many( $parser) : applies $parser zero or more times
many1( $parser) : applies $parser one or more times
sepBy( $separator, $parser) : parses a list of $parser separated by $separator
oneOf( [$patt1, $patt2,...]): like symbol() but parses the patterns in order* Dangerous: the following parsers take a regular expression, so you can mix regexes and other combinators ...
upto( $patt )
greedyUpto( $patt)
regex( $patt)## Labeling
You can label any parser in a sequence using an anonymous hash, for example:
sub type_parser {
sequence [
{Type => word},
maybe parens choice(
{Kind => natural},
sequence [
symbol('kind'),
symbol('='),
{Kind => natural}
]
)
]
}Applying this parser returns a tuple as follows:
my $str = 'integer(kind=8), '
(my $status, my $rest, my $matches) = type_parser($str);Here,`$status` is 0 if the match failed, 1 if it succeeded. `$rest` contains the rest of the string.
The actual matches are stored in the array $matches. As every parser returns its resuls as an array ref,
`$matches` contains the concrete parsed syntax, i.e. a nested array of arrays of strings.show($matches) ==> [{'Type' => 'integer'},['kind','\\=',{'Kind' => '8'}]]
You can remove the unlabeled matches and convert the raw tree into nested hashes using `getParseTree`:
my $parse_tree = getParseTree($matches);
show($parse_tree) ==> {'Type' => 'integer','Kind' => '8'}
## A more complete example
I wrote this library because I needed to parse argument declarations of Fortran-95 code. Some examples of valid declarations are:
integer(kind=8), dimension(0:ip, -1:jp+1, kp) , intent( In ) :: u, v,w
real, dimension(0:7) :: f
real(8), dimension(0:7,kp) :: f,gI want to extract the type and kind, the dimension and the list of variable names. For completeness I'm parsing the \`intent\` attribute as well.
The parser is a sequence of four separate parsers `type_parser`, `dim_parser`, `intent_parser` and `arglist_parser`.
All the optional fields are wrapped in a `maybe()`.my $F95_arg_decl_parser =
sequence [
whiteSpace,
{TypeTup => &type_parser},
maybe(
sequence [
comma,
&dim_parser
],
),
maybe(
sequence [
comma,
&intent_parser
],
),
&arglist_parser
];# where
sub type_parser {
sequence [
{Type => word},
maybe parens choice(
{Kind => natural},
sequence [
symbol('kind'),
symbol('='),
{Kind => natural}
]
)
]
}sub dim_parser {
sequence [
symbol('dimension'),
{Dim => parens sepBy(',', regex('[^,\)]+')) }
]
}sub intent_parser {
sequence [
symbol('intent'),
{Intent => parens word}
]
}sub arglist_parser {
sequence [
symbol('::'),
{Vars => sepBy(',',&word)}
]
}Running the parser and calling `getParseTree()` on the first string results in
{
'TypeTup' => {
'Type' => 'integer',
'Kind' => '8'
},
'Dim' => ['0:ip','-1:jp+1','kp'],
'Intent' => 'In',
'Vars' => ['u','v','w']
}See the test fortran95\_argument\_declarations.t for the source code.
### No Monads?!
As this library is inspired by a monadic parser combinator library from Haskell, I have also implemented `bindP()` and `returnP()` for those who like monads ^\_^
So instead of sayingmy $pp = sequence [ $p1, $p2, $p3 ]
you can say
my $pp = bindP(
$p1,
sub { (my $x) =@_;
bindP(
$p2,
sub {(my $y) =@_;
bindP(
$p3,
sub { (my $z) = @_;
returnP->($z);
}
)->($y)
}
)->($x);
}
);which is obviously so much better :-)
# AUTHOR
Wim Vanderbauwhede
# COPYRIGHT
Copyright 2013- Wim Vanderbauwhede
# LICENSE
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.# SEE ALSO
\- The original Parsec library: [http://legacy.cs.uu.nl/daan/download/parsec/parsec.html](http://legacy.cs.uu.nl/daan/download/parsec/parsec.html) and [http://hackage.haskell.org/package/parsec](http://hackage.haskell.org/package/parsec)