{"id":13605239,"url":"https://github.com/aappleby/matcheroni","last_synced_at":"2025-04-24T04:16:42.057Z","repository":{"id":166250923,"uuid":"641643316","full_name":"aappleby/matcheroni","owner":"aappleby","description":"A minimalist single-header library for building pattern-matchers, lexers, and parsers.","archived":false,"fork":false,"pushed_at":"2025-02-23T03:42:48.000Z","size":7663,"stargazers_count":200,"open_issues_count":2,"forks_count":5,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-24T04:16:38.413Z","etag":null,"topics":["c","cplusplus-20","lexer","lexing","parser","parsing","parsing-expression-grammar","parsing-expression-grammars","pattern-matching","regex","regular-expression","regular-expression-engine","regular-expressions","text-processing"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aappleby.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-16T21:40:05.000Z","updated_at":"2025-04-14T21:33:06.000Z","dependencies_parsed_at":"2024-05-20T04:44:42.708Z","dependency_job_id":null,"html_url":"https://github.com/aappleby/matcheroni","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappleby%2Fmatcheroni","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappleby%2Fmatcheroni/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappleby%2Fmatcheroni/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappleby%2Fmatcheroni/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aappleby","download_url":"https://codeload.github.com/aappleby/matcheroni/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250560057,"owners_count":21450173,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","cplusplus-20","lexer","lexing","parser","parsing","parsing-expression-grammar","parsing-expression-grammars","pattern-matching","regex","regular-expression","regular-expression-engine","regular-expressions","text-processing"],"created_at":"2024-08-01T19:00:56.298Z","updated_at":"2025-04-24T04:16:42.031Z","avatar_url":"https://github.com/aappleby.png","language":"C++","readme":"# Matcheroni \u0026 Parseroni\n\n[Matcheroni](https://github.com/aappleby/Matcheroni/blob/main/matcheroni/Matcheroni.hpp) is a minimalist, zero-dependency, single-header C++20 library for doing pattern matching using [Parsing Expression Grammars](https://en.wikipedia.org/wiki/Parsing_expression_grammar) (PEGs). PEGs are similar to regular expressions, but both more and less powerful.\n\n[Parseroni](https://github.com/aappleby/Matcheroni/blob/main/matcheroni/Parseroni.hpp) is a companion single-header library that can capture the content of Matcheroni patterns and assemble them into concrete [parse trees](https://en.wikipedia.org/wiki/Parse_tree).\n\nTogether, Matcheroni and Parseroni generate tiny and fast parsers that are easy to customize and integrate into your existing codebase.\n\nA tutorial for building a JSON parser in Matcheroni+Parseroni can be found [here](https://aappleby.github.io/Matcheroni/tutorial)\n\nDocumentation for Matcheroni can be found [here](https://aappleby.github.io/Matcheroni)\n\n# Building the Matcheroni examples and tests\n\nInstall [Ninja](https://ninja-build.org/) if you haven't already, then run ninja in the repo root. The test suite will run as part of the build process.\n\nTo rebuild the emscriptened tutorial, run \"build -f build_docs.ninja\" in the repo root.\n\nSee build.ninja for configuration options.\n\n# Performance\n\nMatcheroni performance is going to depend heavily on the grammar you're parsing, but for comparisons against other libraries we can use the JSON parser from the tutorial. It's a straightforward translation of the JSON.org spec in about 100 lines of code and compiles down into ~10k of machine code in a few hundred milliseconds.\n\nThe benchmark in [examples/json/json_benchmark.cpp](examples/json/json_benchmark.cpp) parses the test files from [nativejson-benchmark](https://github.com/miloyip/nativejson-benchmark) and [rapidjson](https://github.com/Tencent/rapidjson) 100 times and reports the median parse time for each file and the sum of the medians for all test files.\n\nWhen built with \"-O3 -flto\", the benchmark can parse the three test files from nativejson-benchmark in about 4.3 milliseconds on my Ryzen 5900x - competitive with most other native JSON parsers, though not quite apples-to-apples as Parseroni does not automatically convert numeric fields from text to doubles and it keeps parse state on the stack (meaning it can overflow if given malicious documents). Adding a quick-and-dirty atof() implementation slows things down to ~5.2 milliseconds.\n\nWhen measuring parse rate in gigs/sec, the two leading libraries seem to be RapidJSON and simdjson. Matcheroni can't really match their performance, but it doesn't do too bad:\n\n| Library                                           | rapidjson_sample.json parse rate |\n| ------------------------------------------------- | ----------- |\n| Matcheroni+Parseroni json_benchmark -O3 -flto     | 1.51 gb/sec |\n| RapidJSON DocumentParse_MemoryPoolAllocator_SSE42 | 2.85 gb/sec |\n| simdjson \"quickstart.cpp\" -O3 -flto               | 10.01 gb/sec |\n\nSo if you're parsing JSON and _really_ need performance, use simdjson. Matcheroni is fast enough for most use cases, but it's never going to beat SIMD.\n\n# Caveats\n\nMatcheroni requires C++20, which is a non-starter for some projects. There's not a lot I can do about that, as I'm heavily leveraging some newish template stuff that doesn't have any backwards-compatible equivalents.\n\nLike parsing expression grammars, matchers are greedy - ```Seq\u003cSome\u003cAtom\u003c'a'\u003e\u003e, Atom\u003c'a'\u003e\u003e``` will _always_ fail as ```Some\u003cAtom\u003c'a'\u003e\u003e``` leaves no 'a's behind for the second ```Atom\u003c'a'\u003e``` to match.\n\nMatcheroni does not implement any form of [packrat parsing](https://pdos.csail.mit.edu/~baford/packrat/icfp02/), though it could be added on top. Trying to do [operator-precedence parsing](https://en.wikipedia.org/wiki/Operator-precedence_parser) using the precedence-climbing method will be unbearably slow due to the huge number of recursive calls that don't end up matching anything.\n\nRecursive matchers create recursive code that can explode your call stack.\n\nLeft-recursive matchers can get stuck in an infinite loop - this is true with most versions of Parsing Expression Grammars, it's a fundamental limitation of the algorithm.\n\n# A Particularly Large Matcheroni Pattern\n\nHere's the code I use to match C99 integers, plus a few additions from the C++ spec and the GCC extensions.\n\nNote that it consists of 20 ```using``` declarations and the only actual \"code\" is ```return integer_constant::match(ctx, body);```\n\nIf you follow along in Appendix A of the [C99 spec](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), you'll see it lines up quite closely.\n\n```cpp\nTextSpan match_int(TextMatchContext\u0026 ctx, TextSpan body) {\n  // clang-format off\n  using digit                = Range\u003c'0', '9'\u003e;\n  using nonzero_digit        = Range\u003c'1', '9'\u003e;\n\n  using decimal_constant     = Seq\u003cnonzero_digit, Any\u003cticked\u003cdigit\u003e\u003e\u003e;\n\n  using hexadecimal_prefix         = Oneof\u003cLit\u003c\"0x\"\u003e, Lit\u003c\"0X\"\u003e\u003e;\n  using hexadecimal_digit          = Ranges\u003c'0','9','a','f','A','F'\u003e;\n  using hexadecimal_digit_sequence = Seq\u003chexadecimal_digit, Any\u003cticked\u003chexadecimal_digit\u003e\u003e\u003e;\n  using hexadecimal_constant       = Seq\u003chexadecimal_prefix, hexadecimal_digit_sequence\u003e;\n\n  using binary_prefix         = Oneof\u003cLit\u003c\"0b\"\u003e, Lit\u003c\"0B\"\u003e\u003e;\n  using binary_digit          = Atom\u003c'0','1'\u003e;\n  using binary_digit_sequence = Seq\u003cbinary_digit, Any\u003cticked\u003cbinary_digit\u003e\u003e\u003e;\n  using binary_constant       = Seq\u003cbinary_prefix, binary_digit_sequence\u003e;\n\n  using octal_digit        = Range\u003c'0', '7'\u003e;\n  using octal_constant     = Seq\u003cAtom\u003c'0'\u003e, Any\u003cticked\u003coctal_digit\u003e\u003e\u003e;\n\n  using unsigned_suffix        = Atom\u003c'u', 'U'\u003e;\n  using long_suffix            = Atom\u003c'l', 'L'\u003e;\n  using long_long_suffix       = Oneof\u003cLit\u003c\"ll\"\u003e, Lit\u003c\"LL\"\u003e\u003e;\n  using bit_precise_int_suffix = Oneof\u003cLit\u003c\"wb\"\u003e, Lit\u003c\"WB\"\u003e\u003e;\n\n  // This is begin little odd because we have to match in longest-suffix-first order\n  // to ensure we capture the entire suffix\n  using integer_suffix = Oneof\u003c\n    Seq\u003cunsigned_suffix,  long_long_suffix\u003e,\n    Seq\u003cunsigned_suffix,  long_suffix\u003e,\n    Seq\u003cunsigned_suffix,  bit_precise_int_suffix\u003e,\n    Seq\u003cunsigned_suffix\u003e,\n\n    Seq\u003clong_long_suffix,       Opt\u003cunsigned_suffix\u003e\u003e,\n    Seq\u003clong_suffix,            Opt\u003cunsigned_suffix\u003e\u003e,\n    Seq\u003cbit_precise_int_suffix, Opt\u003cunsigned_suffix\u003e\u003e\n  \u003e;\n\n  // GCC allows i or j in addition to the normal suffixes for complex-ified types :/...\n  using complex_suffix = Atom\u003c'i', 'j'\u003e;\n\n  // Octal has to be _after_ bin/hex so we don't prematurely match the prefix\n  using integer_constant =\n  Seq\u003c\n    Oneof\u003c\n      decimal_constant,\n      hexadecimal_constant,\n      binary_constant,\n      octal_constant\n    \u003e,\n    Seq\u003c\n      Opt\u003ccomplex_suffix\u003e,\n      Opt\u003cinteger_suffix\u003e,\n      Opt\u003ccomplex_suffix\u003e\n    \u003e\n  \u003e;\n\n  return integer_constant::match(ctx, body);\n  // clang-format on\n}\n```\n","funding_links":[],"categories":["C++","Parsing"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faappleby%2Fmatcheroni","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faappleby%2Fmatcheroni","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faappleby%2Fmatcheroni/lists"}