https://github.com/kpiorno/mean

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. Zero dependencies.
https://github.com/kpiorno/mean
bytecode cpp11 grammar grammar-parser parser python python3
Last synced: 2 months ago
JSON representation
A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. Zero dependencies.
Host: GitHub
URL: https://github.com/kpiorno/mean
Owner: kpiorno
License: mit
Created: 2017-11-13T05:10:01.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-11-15T00:50:11.000Z (over 7 years ago)
Last Synced: 2025-06-23T16:40:04.345Z (about 1 year ago)
Topics: bytecode, cpp11, grammar, grammar-parser, parser, python, python3
Language: C++
Homepage:
Size: 120 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project

README

          # Mean

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. No dependencies. 

### Motivation

"Mean" is the result of a personal journey to parse and make custom changes to the Python 3 grammar. 

The library uses an EBNF-like syntax, automaticaly generates the node tree and has a funny way to travel it. Support a

Python-like bytecode generator and a mechanism for traveling it for Virtual Machines, Generators, etc...  More details at "examples" folder. 

### Quickstart

Please check the full example ```simple.cpp ``` at "examples" folder.

Create a grammar file: ```expr_grammar.mng ```

```

expr: term ( ('+' | '-') term )*

term: factor ( ('*' | '/') factor )*

factor: base [ '^' term ]

base: '(' expr ')' | ['-'] NUMBER

```

Create a source code file: ```expr_grammar.mn ```

```2 ^ 6 -- 1```

So we need a ```MN_Lexer``` which will read the source code file. We could use built-in lexer "MN_Simple" which generates tokens: ```NAME, NUMBER, HEX, +, -, ^```:

```cpp

#include 

#include 

...

int main(int /*argc*/, char **/*argv[]*/)

{

    //Class with shortcuts methods

    auto root = new mn::MNRoot();

    //Class to store the encountered errors

    mn::MNErrorMsg* err = new mn::MNErrorMsg();

    

    //Built-in Lexer

    auto parse_lexer = new mn::lexer::MN_Simple("path/to/expr_grammar.mn", err);  

    //Output the tree node

    root->test_parser("expr_grammar.mng", "expr", parse_lexer)   

}    

```

The above code will output the next tree:

```

|expr -- 100

  |factor -- 100

    |base -- 100

      |(?) -- 17

      |2 -- 26

    |^ -- 2

    |base -- 100

      |(?) -- 17

      |6 -- 26

  |[.] -- 30

    |- -- 2

    |base -- 100

      |- -- 2

      |1 -- 26

```

The nodes with the number ```100``` are non-terminals.  

The nodes ```[.] - 30``` are lists.

The nodes ```[?] - 17``` are optionals nodes. 

The rest of the nodes are terminals.

So lets do the right stuff to travel it.

It's possible to use a custom way to do that, but luckily we have the template class ```MN_Traveler``` which help us.

Now lets declare a class which inherit from ```MN_Traveler```:

```cpp

#include 

#include 

//Class to travel the tree

class Expr: public mn::MNTraveler

{

public:

    Expr(){

        //Bind the "expr" method with the "expr" grammar's non-terminal

        bind("expr", &Expr::expr, this);    

    }

    float expr(mn::MNNode* node)

    {

        return 1.1; 

    }

    //Entry point to start the travel   

    float walk(mn::MNNode* node)

    {

        return call(node);

    }    

};    

int main(int /*argc*/, char **/*argv[]*/) 

{

    //Class with shortcuts methods

    auto root = new mn::MNRoot();

    //Class to store the encountered errors

    mn::MNErrorMsg* err = new mn::MNErrorMsg();

    auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

    auto traveler = new Expr();

    //The run_parser method generate the tree and use the traveler class to travel it.

    std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",

                                                   parse_lexer, traveler) << std::endl;

    delete traveler;

    delete err;

    delete root;

}

```

The second parameter of the template class define "float" as the type of all methods bounds to the grammar's non-terminals. Thus we can define specific custom types. 

```bind("expr", &Expr::expr, this);``` bind the "expr" method with the "expr" grammar's non-terminal.

The ```float expr(mn::MNNode* node) ...``` method will handle the output generated by the non-terminal "expr".

The ```float walk(mn::MNNode* node) ...``` is the first method which is invoked. The method ```call(node)``` in ```return call(node)``` will invoke the 

right method bound to the non-terminal ("expr" non-terminal according to the tree generated by the previous example). Use "call" method when you need to travel a non-terminal.

So, executing the last code we got the next output:

```Result is: 1.1```

Which is the value that returns "expr" method.

So, adding the proper funcionalities we got a simple expression evaluator with the next code: 

```cpp

#include 

#include 

#include 

#include 

class Expr: public mn::MNTraveler

{

public:

    Expr(){

        bind("expr", &Expr::expr, this);

        bind("term", &Expr::term, this);

        bind("factor", &Expr::factor, this);

        bind("base", &Expr::base, this);

    }

    float expr(mn::MNNode* node)

    {

        //expr: term ( ('+' | '-') term )*

        return arith_op(node);

    }

    float term(mn::MNNode* node)

    {

        //term: factor ( ('*' | '/') factor )*    

        return arith_op(node);

    }

    float factor(mn::MNNode* node)

    {

        //factor: base [ '^' term ]

        auto base = node->at(0);

        auto exp = node->at(2);

        float base_value = call(&base);

        float exp_value = call(&exp);

        return pow(base_value, exp_value);

    }

    float base(mn::MNNode* node)

    {

        //base: '(' expr ')' |  ['-'] NUMBER        

        auto n = node->at(0);

        if (n.get_meta() == 2 && n.get_lexeme() == "(")

        {

            auto expr = node->at(1);

            return call(&expr);

        }

        else

        {

            auto number = node->at(1);

            float res = atof(number.get_lexeme().c_str());

            if (n.is_found())

                res *= -1;

            return res;

        }

    }

    float arith_op(mn::MNNode* node)

    {

        auto term = node->at(0);

        float res = call(&term);

        auto term_list = node->at(1);

        for (unsigned int i=0; i < term_list.get_count(); ++i)

        {

            if ((i+1) % 2 == 0)

            {

                auto sign_node = term_list.at(i-1);

                char op = sign_node.get_lexeme()[0];

                auto res_node = term_list.at(i);

                switch (op) {

                case '+':

                    res += call(&res_node);

                    break;

                case '-':

                    res -= call(&res_node);

                    break;

                case '*':

                    res *= call(&res_node);

                    break;

                case '/':

                    res /= call(&res_node);

                    break;

                default:

                    res -= call(&res_node);

                    break;

                }

            }

        }

        return res;

    }

    float walk(mn::MNNode* node)

    {

        return call(node);

    }

};

int main(int /*argc*/, char **/*argv[]*/)

{

    //Class with shortcuts methods

    auto root = new mn::MNRoot();

    //Class to store the encountered errors

    mn::MNErrorMsg* err = new mn::MNErrorMsg();

    auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

    auto traveler = new Expr();

    //The run_parser method generate the tree and use the traveler class to travel it.

    std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",

                                                   parse_lexer, traveler) << std::endl;

    delete traveler;

    delete err;

    delete root;

}

};

```

### Designs

In order to reduce the amount of nodes from generated trees, for the rules like this ```A: B [C]```:

```If [C] not encountered then A is replaced by B```. According to the last example if you change the context of file ```expr_grammar.mn ``` to 1 and execute the example, you will got

the next output:

```

|base -- 100

  |(?) -- 17

  |1 -- 26

```

And the tree will be right traveled because the method "walk" invoke the bound method "base" via ```return call(node);```.

 

### Extending Lexers

Extending lexers adding new tokens is straighfordward, e.g:

```cpp

enum MN_PY_TOKENS

{

    //Please use tokens index from MN_CUSTOM_TOKEN_INDEX, values below are reserved by the library.

    HEX = MN_CUSTOM_TOKEN_INDEX,

    STAR

};

namespace mn

{

    namespace lexer

    {

        MN_Simple::MN_Simple(const std::string& file_name, MNErrorMsg* error_msg)

            :MNLexer(file_name, error_msg)

        {

            //Register the token HEX. Could be referenced at grammar definition with the alias "HEX"

            register_custom_token("HEX", HEX);

            //Register the token STAR. Could be referenced at grammar definition with the alias "STAR"

            register_custom_token("STAR", STAR);

        }

        MNToken* MN_Simple::next_token()

        {

            ...

            //Somewhere at your "next_token" func

            if (current_char == '★')

                //return the STAR token 

                return ret_token(new MNToken(STAR, "★", row, col));

            ...    

        }    

    }

}    

```

So, you could use the new tokens at your grammar like this:

```

SKY: (STAR)*

```

### Limitations 

Despite that it's possible to declare terminals tokens like identifiers or numbers via the EBNF, the "mean" library is designed to generate tokens using ```MN_Lexer``` classes. 

So regular expression to declare tokens is not supported at now. The goal is to use another tokens sources beyond those generated via regular expressions.

The ```* +``` operators could be used only inmediate next to ```)```. So expression like ```NAME*``` is not supported. Use ```(NAME)*``` instead.

if you have a rule like: ```A: B | D | [C]``` please consider change it to the form: ```A: [B | D | [C]]``` to avoid unexpected behaviours.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kpiorno/mean

Awesome Lists containing this project

README