An open API service indexing awesome lists of open source software.

https://github.com/kpiorno/mean

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. Zero dependencies.
https://github.com/kpiorno/mean

bytecode cpp11 grammar grammar-parser parser python python3

Last synced: 2 months ago
JSON representation

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. Zero dependencies.

Awesome Lists containing this project

README

          

# Mean
A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. No dependencies.

### Motivation
"Mean" is the result of a personal journey to parse and make custom changes to the Python 3 grammar.
The library uses an EBNF-like syntax, automaticaly generates the node tree and has a funny way to travel it. Support a
Python-like bytecode generator and a mechanism for traveling it for Virtual Machines, Generators, etc... More details at "examples" folder.

### Quickstart
Please check the full example ```simple.cpp ``` at "examples" folder.

Create a grammar file: ```expr_grammar.mng ```
```
expr: term ( ('+' | '-') term )*
term: factor ( ('*' | '/') factor )*
factor: base [ '^' term ]
base: '(' expr ')' | ['-'] NUMBER
```
Create a source code file: ```expr_grammar.mn ```

```2 ^ 6 -- 1```

So we need a ```MN_Lexer``` which will read the source code file. We could use built-in lexer "MN_Simple" which generates tokens: ```NAME, NUMBER, HEX, +, -, ^```:

```cpp
#include
#include

...

int main(int /*argc*/, char **/*argv[]*/)
{
//Class with shortcuts methods
auto root = new mn::MNRoot();
//Class to store the encountered errors
mn::MNErrorMsg* err = new mn::MNErrorMsg();

//Built-in Lexer
auto parse_lexer = new mn::lexer::MN_Simple("path/to/expr_grammar.mn", err);
//Output the tree node
root->test_parser("expr_grammar.mng", "expr", parse_lexer)
}
```
The above code will output the next tree:

```
|expr -- 100
|factor -- 100
|base -- 100
|(?) -- 17
|2 -- 26
|^ -- 2
|base -- 100
|(?) -- 17
|6 -- 26
|[.] -- 30
|- -- 2
|base -- 100
|- -- 2
|1 -- 26
```

The nodes with the number ```100``` are non-terminals.

The nodes ```[.] - 30``` are lists.

The nodes ```[?] - 17``` are optionals nodes.

The rest of the nodes are terminals.

So lets do the right stuff to travel it.
It's possible to use a custom way to do that, but luckily we have the template class ```MN_Traveler``` which help us.

Now lets declare a class which inherit from ```MN_Traveler```:

```cpp
#include
#include

//Class to travel the tree
class Expr: public mn::MNTraveler
{
public:
Expr(){
//Bind the "expr" method with the "expr" grammar's non-terminal
bind("expr", &Expr::expr, this);
}
float expr(mn::MNNode* node)
{
return 1.1;
}
//Entry point to start the travel
float walk(mn::MNNode* node)
{
return call(node);
}
};

int main(int /*argc*/, char **/*argv[]*/)
{
//Class with shortcuts methods
auto root = new mn::MNRoot();
//Class to store the encountered errors
mn::MNErrorMsg* err = new mn::MNErrorMsg();
auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

auto traveler = new Expr();
//The run_parser method generate the tree and use the traveler class to travel it.
std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",
parse_lexer, traveler) << std::endl;
delete traveler;
delete err;
delete root;
}
```

The second parameter of the template class define "float" as the type of all methods bounds to the grammar's non-terminals. Thus we can define specific custom types.

```bind("expr", &Expr::expr, this);``` bind the "expr" method with the "expr" grammar's non-terminal.

The ```float expr(mn::MNNode* node) ...``` method will handle the output generated by the non-terminal "expr".

The ```float walk(mn::MNNode* node) ...``` is the first method which is invoked. The method ```call(node)``` in ```return call(node)``` will invoke the
right method bound to the non-terminal ("expr" non-terminal according to the tree generated by the previous example). Use "call" method when you need to travel a non-terminal.

So, executing the last code we got the next output:

```Result is: 1.1```

Which is the value that returns "expr" method.

So, adding the proper funcionalities we got a simple expression evaluator with the next code:

```cpp
#include
#include
#include
#include

class Expr: public mn::MNTraveler
{
public:
Expr(){
bind("expr", &Expr::expr, this);
bind("term", &Expr::term, this);
bind("factor", &Expr::factor, this);
bind("base", &Expr::base, this);
}
float expr(mn::MNNode* node)
{
//expr: term ( ('+' | '-') term )*
return arith_op(node);
}

float term(mn::MNNode* node)
{
//term: factor ( ('*' | '/') factor )*
return arith_op(node);
}

float factor(mn::MNNode* node)
{
//factor: base [ '^' term ]
auto base = node->at(0);
auto exp = node->at(2);
float base_value = call(&base);
float exp_value = call(&exp);
return pow(base_value, exp_value);
}

float base(mn::MNNode* node)
{
//base: '(' expr ')' | ['-'] NUMBER
auto n = node->at(0);
if (n.get_meta() == 2 && n.get_lexeme() == "(")
{
auto expr = node->at(1);
return call(&expr);
}
else
{
auto number = node->at(1);
float res = atof(number.get_lexeme().c_str());
if (n.is_found())
res *= -1;
return res;
}
}

float arith_op(mn::MNNode* node)
{
auto term = node->at(0);
float res = call(&term);
auto term_list = node->at(1);
for (unsigned int i=0; i < term_list.get_count(); ++i)
{
if ((i+1) % 2 == 0)
{
auto sign_node = term_list.at(i-1);
char op = sign_node.get_lexeme()[0];
auto res_node = term_list.at(i);
switch (op) {
case '+':
res += call(&res_node);
break;
case '-':
res -= call(&res_node);
break;
case '*':
res *= call(&res_node);
break;
case '/':
res /= call(&res_node);
break;
default:
res -= call(&res_node);
break;
}
}
}
return res;
}

float walk(mn::MNNode* node)
{
return call(node);
}
};
int main(int /*argc*/, char **/*argv[]*/)
{
//Class with shortcuts methods
auto root = new mn::MNRoot();
//Class to store the encountered errors
mn::MNErrorMsg* err = new mn::MNErrorMsg();
auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

auto traveler = new Expr();
//The run_parser method generate the tree and use the traveler class to travel it.
std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",
parse_lexer, traveler) << std::endl;
delete traveler;
delete err;
delete root;
}
};
```
### Designs
In order to reduce the amount of nodes from generated trees, for the rules like this ```A: B [C]```:

```If [C] not encountered then A is replaced by B```. According to the last example if you change the context of file ```expr_grammar.mn ``` to 1 and execute the example, you will got
the next output:

```
|base -- 100
|(?) -- 17
|1 -- 26
```

And the tree will be right traveled because the method "walk" invoke the bound method "base" via ```return call(node);```.

### Extending Lexers
Extending lexers adding new tokens is straighfordward, e.g:

```cpp
enum MN_PY_TOKENS
{
//Please use tokens index from MN_CUSTOM_TOKEN_INDEX, values below are reserved by the library.
HEX = MN_CUSTOM_TOKEN_INDEX,
STAR
};

namespace mn
{
namespace lexer
{
MN_Simple::MN_Simple(const std::string& file_name, MNErrorMsg* error_msg)
:MNLexer(file_name, error_msg)
{
//Register the token HEX. Could be referenced at grammar definition with the alias "HEX"
register_custom_token("HEX", HEX);
//Register the token STAR. Could be referenced at grammar definition with the alias "STAR"
register_custom_token("STAR", STAR);
}

MNToken* MN_Simple::next_token()
{
...
//Somewhere at your "next_token" func
if (current_char == '★')
//return the STAR token
return ret_token(new MNToken(STAR, "★", row, col));
...
}
}
}
```

So, you could use the new tokens at your grammar like this:

```
SKY: (STAR)*
```

### Limitations
Despite that it's possible to declare terminals tokens like identifiers or numbers via the EBNF, the "mean" library is designed to generate tokens using ```MN_Lexer``` classes.
So regular expression to declare tokens is not supported at now. The goal is to use another tokens sources beyond those generated via regular expressions.

The ```* +``` operators could be used only inmediate next to ```)```. So expression like ```NAME*``` is not supported. Use ```(NAME)*``` instead.

if you have a rule like: ```A: B | D | [C]``` please consider change it to the form: ```A: [B | D | [C]]``` to avoid unexpected behaviours.