Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/weltling/parle
Parser and lexer for PHP
https://github.com/weltling/parle
lexer parser php
Last synced: 3 months ago
JSON representation
Parser and lexer for PHP
- Host: GitHub
- URL: https://github.com/weltling/parle
- Owner: weltling
- License: other
- Created: 2017-09-02T19:22:28.000Z (over 7 years ago)
- Default Branch: main
- Last Pushed: 2023-08-02T14:37:06.000Z (over 1 year ago)
- Last Synced: 2023-08-02T15:50:05.141Z (over 1 year ago)
- Topics: lexer, parser, php
- Language: C++
- Homepage:
- Size: 590 KB
- Stars: 77
- Watchers: 7
- Forks: 10
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Parle provides lexing and parsing facilities for PHP
=============================================
Lexing and parsing is used widely in the PHP core and extensions. Usually such a functionality is packed into a piece of C/C++ and depends on tools like [flex](http://flex.sourceforge.net/), [re2c](http://re2c.org/), [Bison](http://www.gnu.org/software/bison/), [LEMON](http://www.hwaci.com/sw/lemon/) or similar. With Parle, it is possible to implement lexing and parsing in PHP while relying on features and principles of the parser/lexer generator tools for C/C++. The Lexer and Parser classes are there in the Parle namespace.
The implementation bases on the work of [Ben Hanson](http://www.benhanson.net/)- https://github.com/BenHanson/lexertl14
- https://github.com/BenHanson/parsertl14The lexer is based on the pattern matching similar to flex. The parser is LALR(1).
Supported is PHP 7.4 and above. A [C++14](http://en.cppreference.com/w/cpp/compiler_support) capable compiler is required. As of version 0.7.3 parle can optionally be compiled with internal UTF-32 support, making it possible to use Unicode character classes in patterns.
The full extension documentation is available in the [PHP Manual](http://php.net/parle).
Installation
============Read the [INSTALL.md](./INSTALL.md) documentation.
Example tokenizing comma separated integer list
============================================
```phpuse Parle\Token;
use Parle\Lexer;
use Parle\LexerException;/* name => id */
$token = array(
"COMMA" => 1,
"CRLF" => 2,
"DECIMAL" => 3,
);
/* id => name */
$tokenIdToName = array_flip($token);$lex = new Lexer;
$lex->push("[\x2c]", $token["COMMA"]);
$lex->push("[\r][\n]", $token["CRLF"]);
$lex->push("[\d]+", $token["DECIMAL"]);
$lex->build();$in = "0,1,2\r\n3,42,5\r\n6,77,8\r\n";
$lex->consume($in);
do {
$lex->advance();
$tok = $lex->getToken();if (Token::UNKNOWN == $tok->id) {
throw new LexerException("Unknown token '{$tok->value}' at offset {$lex->marker}.");
}echo "TOKEN: ", $tokenIdToName[$tok->id], PHP_EOL;
} while (Token::EOI != $tok->id);```
Example parsing comma separated number list
===========================
```phpuse Parle\Lexer;
use Parle\Parser;
use Parle\ParserException;$p = new Parser;
$p->token("CRLF");
$p->token("COMMA");
$p->token("INTEGER");
$p->token("'\"'");
$p->push("START", "RECORDS");
$prod_record_0 = $p->push("RECORDS", "RECORD CRLF");
$prod_record_1 = $p->push("RECORDS", "RECORDS RECORD CRLF");
$prod_int_0 = $p->push("RECORD", "INTEGER");
$prod_int_1 = $p->push("RECORD", "RECORD COMMA INTEGER");
$p->push("DECIMAL", "INTEGER COMMA INTEGER"); /* Production index unused. */
$prod_dec_0 = $p->push("RECORD", "'\"' DECIMAL '\"'");
$prod_dec_1 = $p->push("RECORD", "RECORD COMMA '\"' DECIMAL '\"'");
$p->build();$lex = new Lexer;
$lex->push("[\x2c]", $p->tokenId("COMMA"));
$lex->push("[\r][\n]", $p->tokenId("CRLF"));
$lex->push("[\d]+", $p->tokenId("INTEGER"));
$lex->push("[\x22]", $p->tokenId("'\"'"));
$lex->build();/* Specifically using comma as both list separator and as a decimal mark. */
$in = "000,111,222\r\n\"333,3\",444,555\r\n666,777,\"888,8\"\r\n";$p->consume($in, $lex);
do {
switch ($p->action) {
case Parser::ACTION_ERROR:
$err = $p->errorInfo();
if (Parser::ERROR_UNKNOWN_TOKEN == $err->id) {
$tok = $err->token;
$msg = "Unknown token '{$tok->value}' at offset {$err->position}";
} else if (Parser::ERROR_NON_ASSOCIATIVE == $err->id) {
$tok = $err->token;
$msg = "Token '{$tok->id}' at offset {$lex->marker} is not associative";
} else if (Parser::ERROR_SYNTAX == $err->id) {
$tok = $err->token;
$msg = "Syntax error at offset {$lex->marker}";
} else {
$msg = "Parse error";
}
throw new ParserException($msg);
break;
case Parser::ACTION_SHIFT:
case Parser::ACTION_GOTO:
case Parser::ACTION_ACCEPT:
continue;
break;
case Parser::ACTION_REDUCE:
switch ($p->reduceId) {
case $prod_int_0:
/* INTEGER */
echo $p->sigil(), PHP_EOL;
break;
case $prod_int_1:
/* RECORD COMMA INTEGER */
echo $p->sigil(2), PHP_EOL;
break;
case $prod_dec_0:
/* '\"' DECIMAL '\"' */
echo $p->sigil(1), PHP_EOL;
break;
case $prod_dec_1:
/* RECORD COMMA '\"' DECIMAL '\"' */
echo $p->sigil(3), PHP_EOL;
break;
case $prod_record_0:
case $prod_record_1:
echo "=====", PHP_EOL;
break;
}
break;
}
$p->advance();
} while (Parser::ACTION_ACCEPT != $p->action);```