{"id":17789552,"url":"https://github.com/ehwan/string-parser","last_synced_at":"2025-04-02T00:45:55.961Z","repository":{"id":163826343,"uuid":"606985590","full_name":"ehwan/String-Parser","owner":"ehwan","description":"Template LL parser generator","archived":false,"fork":false,"pushed_at":"2024-01-13T09:33:30.000Z","size":266,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-25T15:58:25.863Z","etag":null,"topics":["cpp","header-only","parse","parser"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ehwan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-27T04:10:09.000Z","updated_at":"2023-12-30T16:41:27.000Z","dependencies_parsed_at":"2023-12-25T16:27:08.406Z","dependency_job_id":"d2a2d5d7-55ee-4d4c-8c58-bcc36a52e303","html_url":"https://github.com/ehwan/String-Parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ehwan%2FString-Parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ehwan%2FString-Parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ehwan%2FString-Parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ehwan%2FString-Parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ehwan","download_url":"https://codeload.github.com/ehwan/String-Parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246735364,"owners_count":20825224,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","header-only","parse","parser"],"created_at":"2024-10-27T10:34:31.894Z","updated_at":"2025-04-02T00:45:55.942Z","avatar_url":"https://github.com/ehwan.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# String Parser\r\nHeader only template-based LL parser generator.\r\n\r\nfully supported with clang / c++14\r\n\r\n## Install\r\nAdd `include` directory to `INCLUDE_PATH` and\r\n```cpp\r\n#include \u003cparser.hpp\u003e\r\n```\r\nwill include every features to your application\r\n\r\n## Overview\r\n\r\nNote that in all examples,\r\n```cpp\r\n#include \u003cparser.hpp\u003e\r\nnamespace ep = eh::parser;\r\n```\r\n\r\n```cpp\r\n// @FileName : examples/calculator.cpp\r\n\r\nint main()\r\n{\r\n  // REGEX : [0-9]\r\n  // Pattern Match and return its digit as integer type\r\n  auto digit_parser = \r\n    ep::action(\r\n      ep::range( '0', '9' ),\r\n\r\n      // captured Attribute of range() is decltype(*iterator) = char\r\n      // this functor will be called every range() pattern matched\r\n      []( char captured_character )\r\n      {\r\n        // return char-\u003edigit\r\n        // this will be new Attribute of this parser\r\n        return (int)(captured_character - '0');\r\n      }\r\n    );\r\n\r\n  // make action() using [] operator\r\n  auto digit_parser2 =\r\n    ep::range('0', '9')[ digit_parser.functor ];\r\n\r\n  // REGEX : [0-9]+\r\n  auto integer_parser =\r\n    ep::action(\r\n      ep::repeat( digit_parser, 1, 12 ),\r\n\r\n      // captured Attribute of repeat() is vector\u003c child Attribute \u003e = vector\u003cint\u003e\r\n      []( std::vector\u003cint\u003e \u0026captured_digits )\r\n      {\r\n        int result = 0;\r\n        for( int d : captured_digits )\r\n        {\r\n          result = result * 10 + d;\r\n        }\r\n        // return accumulated integer;\r\n        // this will be new Attribute of this parser\r\n        return result;\r\n      }\r\n    );\r\n\r\n  // REGEX : integer_parser ( ('+'|'-') integer_parser )*\r\n  auto plus_minus_expression_noaction =\r\n    ep::seq( // Attribute = tuple\u003c Attr of child \u003e = tuple\u003c int, vector\u003c tuple\u003cchar,int\u003e \u003e \u003e\r\n      integer_parser, // Attribute = int\r\n      ep::repeat( // Attribute = vector\u003c Attr of child \u003e = vector\u003c tuple\u003cchar,int\u003e \u003e\r\n        ep::seq( // Attr = tuple\u003c Attrs of child \u003e = tuple\u003c char, int \u003e\r\n          ep::or_(  // char ( only if all child's Attr are same )\r\n            ep::one('-'), // char\r\n            ep::one('+') // char\r\n          ),\r\n          integer_parser // int\r\n        ),\r\n        0, 10000000\r\n      )\r\n    );\r\n\r\n  // or using literals and operator\r\n  using namespace ep::literals;\r\n  auto plus_minus_expression_noaction2 =\r\n    integer_parser\r\n    \u003e\u003e\r\n    *(\r\n       ('-'_p | '+'_p) \u003e\u003e integer_parser\r\n    );\r\n\r\n  auto plus_minus_expression =\r\n    ep::action(\r\n      plus_minus_expression_noaction,\r\n      // captured tuple will be automatically unpacked\r\n      []( int first_captured_integer, std::vector\u003c std::tuple\u003cchar,int\u003e \u003e const\u0026 rhs_captured )\r\n      {\r\n        int result = first_captured_integer;\r\n\r\n        // process right-hand-side operations\r\n        for( auto tup : rhs_captured )\r\n        {\r\n          char operation = std::get\u003c0\u003e( tup );\r\n          int rhs_integer = std::get\u003c1\u003e( tup );\r\n\r\n          if( operation == '+' ){ result += rhs_integer; }\r\n          else if( operation == '-' ){ result -= rhs_integer; }\r\n        }\r\n\r\n        // return calculated integer;\r\n        // this will be new Attribute of this parser\r\n        return result;\r\n      }\r\n    );\r\n\r\n    std::string input_string = \"1+22-333+4444-55555+666666 unknown string\";\r\n\r\n    auto begin = input_string.begin();\r\n    auto parse_result = plus_minus_expression.parse( begin, input_string.end() );\r\n\r\n    std::cout \u003c\u003c \"Input String : \" \u003c\u003c input_string \u003c\u003c \"\\n\";\r\n    std::cout \u003c\u003c \"Parse Result : \" \u003c\u003c std::boolalpha \u003c\u003c parse_result.is_valid() \u003c\u003c \"\\n\";\r\n    std::cout \u003c\u003c \"Parsed Data : \" \u003c\u003c parse_result.get() \u003c\u003c \" = \" \u003c\u003c 1+22-333+4444-55555+666666 \u003c\u003c \"\\n\";\r\n\r\n    std::cout \u003c\u003c \"Iterator 'begin' point at : \" \u003c\u003c std::string( begin, input_string.end() ) \u003c\u003c \"\\n\";\r\n}\r\n```\r\n\r\n## Examples\r\nsee 'examples/basic.cpp' for basic tutorial\r\n\r\n### Parser Objects\r\nEvery Parser Object have `parse( begin\u0026:iterator, end:iterator )` function\r\nthat performs pattern-mathing scheme\r\n\r\nfor example, `range( min_, max_ )` returns a Parser Object.\r\nIt's parse() function will consume one iterator\r\nand returns whether a character is in range [min_, max_]\r\n\r\nif the pattern doesn't match, iterator will not move\r\n\r\n```cpp\r\n  auto one_small_alphabet = ep::range( 'a', 'z' );\r\n  auto one_big_alphabet = ep::range( 'A', 'Z' );\r\n  auto one_digit = ep::range( '0', '9' );\r\n\r\n  std::string my_string = \"abcdefg 123456\";\r\n\r\n  auto begin = my_string.begin();\r\n  /*\r\n    pattern matched, 'begin' will now point my_string[1]\r\n  */\r\n  one_small_alphabet.parse( begin, my_string.end() );\r\n\r\n  /*\r\n    pattern doesn't match, 'begin' will not move\r\n  */\r\n  one_big_alphabet.parse( begin, my_string.end() );\r\n```\r\n\r\n### Attribute of Parser Objects\r\n\r\n`parse()` function returns an `parse_result_t\u003cAttr\u003e` value\r\nwhich contains the result of pattern matching\r\nand the parsed data from input stream.\r\n\r\nfor example, range() Parser's ( and any other single-iterator-consuming Parser Objects ) Attr is `char` or `decltype(*iterator)` that returns the character it consumed directly.\r\n\r\n```cpp\r\n  auto parse_result = one_small_alphabet.parse( begin, my_string.end() );\r\n  std::cout \u003c\u003c \"Parse Result : \" \u003c\u003c std::boolalpha \u003c\u003c parse_result.is_valid() \u003c\u003c \"\\n\";\r\n  std::cout \u003c\u003c \"Parsed Character : \" \u003c\u003c parse_result.get() \u003c\u003c \"\\n\";\r\n  \r\n  // at this point, 'begin' points my_string[2]\r\n```\r\n\r\n`eh::parser::unused_t` is special class for Parser Object's Attribute. Any parser that does not extract data should use `unusesd_t` as its Attribute.\r\n\r\n### Special Parser Objects and its Attribute\r\n\r\nThere are several *Parser Object Wrapper* that takes other Parser Object and performs modified action on it.\r\n\r\n\r\n`or_( parser1, parser2, ... , parserN )`\r\nwill test every N parsers until one of them is successfully matched.\r\n`Attribute` of `or( p1, p2, ..., pn )` is `Attribute` of `p1` if every parsers have same `Attribute`, else `unused_t`.\r\n| Parser1 | Parser2 | Merged |\r\n|---------|---------|--------|\r\n| `unused_t` | `unused_t` | `unused_t` |\r\n| `unused_t` | `T` | `T` |\r\n| `T` | `unused_t` | `T` |\r\n| `T1` | `T2` | `unused_t` |\r\n\r\nTable: Merged Attribute of `or_` Parser\r\n\r\n`seq( parser1, parser2, ..., parserN )`\r\nwill test every N parsers sequentially.\r\n`Attribute` of seq( p1, p2, ..., pn ) is `tuple\u003c Attribute of p1, Attribute of p2, ..., Attribute of pn \u003e`, `unused_t` will not be captured into tuple.\r\n\r\n| Parser1 | Parser2 | Merged |\r\n|---------|---------|--------|\r\n| `unused_t` | `unused_t` | `unused_t` |\r\n| `unused_t` | `T` | `T` |\r\n| `T` | `unused_t` | `T` |\r\n| `T1` | `T2` | `tuple\u003cT1,T2\u003e` |\r\n| `T1` | `tuple\u003cTs...\u003e` | `tuple\u003cT1,Ts...\u003e` |\r\n| `tuple\u003cTs...\u003e` | `T2` | `tuple\u003cTs...,T2\u003e` |\r\n\r\nTable: Merged Attribute of `seq` Parser\r\n\r\n`repeat( parser, min_, max_ )`\r\nwill test 'parser' X times where X is in range [min_,max_].\r\n`Attribute` of repeat( p ) is `vector\u003c Attribute of p \u003e` if `Attribute of p` is not `unused_t`, else `unused_t`.\r\n\r\n| Child Parser | `Attribute` of `repeat` |\r\n|---------|---------|\r\n| `unused_t` | `unused_t` |\r\n| `T` | `vector\u003cT\u003e` |\r\n\r\nTable: New Attribute of `repeat` Parser\r\n\r\n```cpp\r\n  auto one_alphabet_parser = ep::or_( one_small_alphabet, one_big_alphabet );\r\n\r\n  // this is same as one_alphabet_parser\r\n  auto one_alphabet_parser2 = one_small_alphabet | one_big_alphabet;\r\n\r\n  auto one_alphabet_and_digit = ep::seq( one_alphabet_parser, one_digit );\r\n  auto one_alphabet_and_digit2 = one_alphabet_parser \u003e\u003e one_digit;\r\n\r\n  // [a-zA-Z]*\r\n  auto alphabet_star = ep::repeat( one_alphabet_parser, 0, 9999999 );\r\n\r\n  // [a-zA-Z]+\r\n  auto alphabet_plus = ep::repeat( one_alphabet_parser, 1, 9999999 );\r\n```\r\n\r\n### Action Wrapper\r\n`action( Parser, Functor )` is special parser wrapper.\r\nIt performs same pattern-matching as its child parser,\r\nbut call a functor() every pattern matching is successfully done.\r\n\r\nfunctor() could take child parser's `Attribute` as its argument,\r\nand the returned value will be its new `Attribute`. If the child's `Attribute` is `unused_t`, the functor would not take any arguments.\r\nIf the functor does not return any value( a void function ), the `Attribute` of Action Wrapper will be `unused_t`.\r\n\r\n```cpp\r\n  auto action_parser = ep::action(\r\n    one_small_alphabet,\r\n    []( int ch )\r\n    {\r\n      std::cout \u003c\u003c \"Small Alphabet Parsing Successfully Done!\\n\";\r\n\r\n      // now 'ch*2' is its new Attr\r\n      return ch*2;\r\n    } );\r\n\r\n  // pattern matched, functor() will be called\r\n  // and its Attr would be 'c'*2\r\n  auto action_result = action_parser.parse( begin, my_string.end() );\r\n  std::cout \u003c\u003c \"New Attr : \" \u003c\u003c action_result.get() \u003c\u003c \"\\n\";\r\n```\r\n\r\n| Child Parser | Argument must be captured as |\r\n|--------------|--------|\r\n| `unused_t` | `functor()` |\r\n| `T attr` | `functor(attr)` |\r\n| `tuple\u003cTs...\u003e attrs` | `functor( attrs... )` |\r\n\r\nTable: Functor and its argument\r\n\r\n| Returned Type | Attribute of `action` |\r\n|------|--------|\r\n| `void` | `unused_t` |\r\n| `T` | `T` |\r\n\r\nTable: functor's returned value and its Attribute\r\n\r\nThere are more special Parser Wrapper for advaced Parser creation.\r\n\r\n### Virtual Parser\r\nSince every Parser Object's implementation is based on template idoms, eg. CRTP or SFINAE,\r\nit can get benefit from compiler's smart optimization.\r\nBut, because Parser Objects are being deep-copied and must be defined prior to its actual invoking,\r\nwe can't make cyclic, or recursive patterns.\r\n\r\n`ep::rule\u003cAttribute,Iterator\u003e` is virtual class based Parser Object that can be assigned as any parser objects.\r\n\r\n```cpp\r\nauto compile_time_pattern1 = ep::range('a', 'z');\r\nauto compile_time_pattern2 = ep::range('0', '9');\r\n\r\nep::rule\u003cint,std::string::iterator\u003e virtual_pattern, virtual_reference_pattern;\r\nstd::string str = \"123123 abcabc\";\r\nauto begin = str.begin();\r\n\r\n// this takes the reference of 'virtual_pattern'\r\nvirtual_reference_pattern = ep::ref( virtual_pattern );\r\n\r\n// assign as small-alphabet parser\r\nvirtual_pattern = compile_time_pattern1;\r\n// 'virtual_reference_pattern' will be assigned too\r\n\r\n\r\n// match fail\r\nvirtual_pattern.parse( begin, str.end() );\r\n\r\n// assign as digit parser\r\nvirtual_pattern = compile_time_pattern2;\r\n\r\n// match success\r\nvirtual_pattern.parse( begin, str.end() );\r\nvirtual_reference_pattern.parse( begin, str.end() );\r\n```\r\n\r\n\r\n### Simple Compiler\r\nexamples/compiler/\r\n\r\n```bash\r\n\u003e cat examples/compiler/testsource.txt\r\nfunc1()\r\n{\r\n  var1 = 20;\r\n  // this is comment\r\n  print var1 * 10 + 10;\r\n  return;\r\n  print 20;\r\n}\r\nmain()\r\n{\r\n  func1();\r\n\r\n  /*\r\n  this\r\n  is also\r\n  a comment\r\n  */\r\n\r\n  func2();\r\n}\r\nfunc2()\r\n{\r\n  print 30;\r\n  return;\r\n  print 40;\r\n}\r\n\r\n\r\n\u003e ./compiler ../examples/compiler/testsource.txt\r\nStart Tokenizing...\r\nCPP Comment: // this is comment\r\nC Comment: /*\r\n  this\r\n  is also\r\n  a comment\r\n  */\r\nTokenizing Result: false\r\nfunc1: 1002\r\n(: 40\r\n): 41\r\n{: 123\r\nvar1: 1002\r\n=: 61\r\n20: 1001\r\n;: 59\r\nprint: 1004\r\nvar1: 1002\r\n*: 42\r\n10: 1001\r\n+: 43\r\n10: 1001\r\n;: 59\r\nreturn: 1013\r\n;: 59\r\nprint: 1004\r\n20: 1001\r\n;: 59\r\n}: 125\r\nmain: 1002\r\n(: 40\r\n): 41\r\n{: 123\r\nfunc1: 1002\r\n(: 40\r\n): 41\r\n;: 59\r\nfunc2: 1002\r\n(: 40\r\n): 41\r\n;: 59\r\n}: 125\r\nfunc2: 1002\r\n(: 40\r\n): 41\r\n{: 123\r\nprint: 1004\r\n30: 1001\r\n;: 59\r\nreturn: 1013\r\n;: 59\r\nprint: 1004\r\n40: 1001\r\n;: 59\r\n}: 125\r\nTokenizing End...\r\nCompiling Start...\r\nCompiling End... : true\r\n~~~~~~~~~~~~~~~~~~~~~ Program Result ~~~~~~~~~~~~~~~~~~~~~\r\n210\r\n30\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fehwan%2Fstring-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fehwan%2Fstring-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fehwan%2Fstring-parser/lists"}