Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jafarlihi/cparse

cparse is an LR(1) and LALR(1) parser generator
https://github.com/jafarlihi/cparse

c compiler compiler-construction compiler-frontend compilers lalr lalr-parser lalr-parser-generator lalr1 lr1 lr1-parser parser parser-combinator parser-combinators parser-framework parser-generator parser-library parsing

Last synced: 2 months ago
JSON representation

cparse is an LR(1) and LALR(1) parser generator

Host: GitHub
URL: https://github.com/jafarlihi/cparse
Owner: jafarlihi
License: mit
Created: 2022-11-09T20:10:43.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2022-11-20T17:40:44.000Z (about 2 years ago)
Last Synced: 2023-03-07T21:15:00.380Z (almost 2 years ago)
Topics: c, compiler, compiler-construction, compiler-frontend, compilers, lalr, lalr-parser, lalr-parser-generator, lalr1, lr1, lr1-parser, parser, parser-combinator, parser-combinators, parser-framework, parser-generator, parser-library, parsing
Language: C
Homepage:
Size: 69.3 KB
Stars: 40
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        

## TOC

* [Overview](#overview)

* [Example](#example)

## Overview

> :warning: cparse is still under development and code is unpolished.

cparse is an LR(1) and LALR(1) parser generator for C.

cparse uses [`clex`](https://github.com/jafarlihi/clex) lexer generator for lexical analysis purposes.

Note that LR(1) parsers are memory-intensive (and there are memory leaks...), so something like full C grammar can require up to 4GBs of free RAM. If you don't have RAM to spare then use the LALR(1) parser, which also happens to get constructed faster but might not support some of the grammars that LR(1) does (though in practice most programming language grammars are LALR(1)-compatible).

## Example

```c

#include "cparse.h"

#include "clex/clex.h"

#include 

#include 

#include 

// Create an enum for token types

typedef enum TokenKind {

  RETURN,

  SEMICOL,

  IDENTIFIER,

} TokenKind;

// Provide a string representation for each token type to be used in the grammar

const char * const tokenKindStr[] = {

  [RETURN] = "RETURN",

  [SEMICOL] = "SEMICOL",

  [IDENTIFIER] = "IDENTIFIER",

};

int main(int argc, char *argv[]) {

  // Register your token types with `clex`

  clexRegisterKind("return", RETURN);

  clexRegisterKind(";", SEMICOL);

  clexRegisterKind("[a-zA-Z_]([a-zA-Z_]|[0-9])*", IDENTIFIER);

  // Write your grammar

  char grammarString[] =

    "S -> A IDENTIFIER SEMICOL\n"

    "A -> RETURN";

  // Parse your grammar

  Grammar *grammar = cparseGrammar(grammarString);

  // Construct an LALR(1) parser

  LALR1Parser *parser = cparseCreateLALR1Parser(grammar, tokenKindStr);

  // Or construct an LR(1) parser

  //LR1Parser *parser = cparseCreateLR1Parser(grammar, tokenKindStr);

  // Test if given input is valid within the grammar

  assert(cparseAccept(parser, "return id1;") == true);

  // Parse a given input and get a parse tree

  ParseTreeNode *node = cparse(parser, "return id1;");

  // Use your parse tree

  assert(strcmp(node->value, "S") == 0);

  assert(strcmp(node->children[0]->value, "SEMICOL") == 0);

  assert(node->children[0]->token.kind == SEMICOL);

  assert(strcmp(node->children[0]->token.lexeme, ";") == 0);

  assert(strcmp(node->children[1]->value, "IDENTIFIER") == 0);

  assert(node->children[1]->token.kind == IDENTIFIER);

  assert(strcmp(node->children[1]->token.lexeme, "id1") == 0);

  assert(strcmp(node->children[2]->value, "A") == 0);

  assert(strcmp(node->children[2]->children[0]->value, "RETURN") == 0);

  assert(node->children[2]->children[0]->token.kind == RETURN);

  assert(strcmp(node->children[2]->children[0]->token.lexeme, "return") == 0);

}

```