{"id":13729490,"url":"https://github.com/foonathan/lex","last_synced_at":"2025-05-08T01:32:41.832Z","repository":{"id":74810262,"uuid":"148897732","full_name":"foonathan/lex","owner":"foonathan","description":"Replaced by foonathan/lexy","archived":true,"fork":false,"pushed_at":"2020-12-01T13:54:19.000Z","size":315,"stargazers_count":138,"open_issues_count":0,"forks_count":8,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-11-14T20:38:03.601Z","etag":null,"topics":["cplusplus","lexer","tokenizer"],"latest_commit_sha":null,"homepage":"https://github.com/foonathan/lexy","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/foonathan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-09-15T11:55:07.000Z","updated_at":"2024-08-06T19:43:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"1ec1b879-9ee2-4d09-8192-50ed04368f38","html_url":"https://github.com/foonathan/lex","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foonathan%2Flex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foonathan%2Flex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foonathan%2Flex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foonathan%2Flex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/foonathan","download_url":"https://codeload.github.com/foonathan/lex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252981691,"owners_count":21835467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cplusplus","lexer","tokenizer"],"created_at":"2024-08-03T02:01:01.089Z","updated_at":"2025-05-08T01:32:41.475Z","avatar_url":"https://github.com/foonathan.png","language":"C++","readme":"# foonathan/lex\n\n![Project Status](https://img.shields.io/endpoint?url=https%3A%2F%2Fwww.jonathanmueller.dev%2Fproject%2Flex%2Findex.json)\n[![Build Status](https://dev.azure.com/foonathan/lex/_apis/build/status/foonathan.lex)](https://dev.azure.com/foonathan/lex/_build/latest?definitionId=2)\n\n\u003e Note: Replaced by foonathan/lexy.\n\nThis library is a C++14 `constexpr` tokenization and (in the future) parsing library.\nThe tokens are specified in the type system so they are available at compile-time.\nWith this information a [trie](https://en.wikipedia.org/wiki/Trie) is constructed that efficiently matches the input.\n\n## Basic Example\n\nThe tokens for a simple calculator:\n\n```cpp\nusing tokens = lex::token_spec\u003cstruct variable, struct plus, struct minus, …\u003e;\n\nstruct variable : lex::rule_token\u003cvariable, tokens\u003e\n{\n    static constexpr auto rule() const noexcept\n    {\n        // variables consists of one or more characters\n        return lex::token_rule::plus(lex::ascii::is_alpha);\n    }\n};\n\nstruct plus : lex::literal_token\u003c'+'\u003e\n{};\n\nstruct minus : lex::literal_token\u003c'-'\u003e\n{};\n```\n\nSee [example/ctokenizer.cpp](example/ctokenizer.cpp) for an annotated example and tutorial.\n\n## Features\n\n* Declarative token specification: No need to worry about ordering or implementing lexing by hand.\n* Fast: Performance is comparable or faster to a handwritten state machine, see benchmarks.\n* Lightweight: No memory allocation, tokens are just string views into the input.\n* Lazy: The `lex::tokenizer` will just tokenize the next token in the input.\n* Fully `constexpr`: The entire lexing can happen at compile-time.\n* Flexible error handling: On invalid input, a `lex::error_token` is created consuming one characters.\nThe parser can then decide how an error should be handled.\n\n## FAQ\n\n**Q: Isn't the name [lex](https://en.wikipedia.org/wiki/Lex_(software)) already taken?**\n\nA: It is. That's why the library is called `foonathan/lex`.\nIn my defense, naming is hard.\nI could come up with some cute name, but then its not really descriptive.\nIf you know `foonathan/lex`, you know what the project is about.\n\n**Q: Sounds great, but what about compile-time?**\n\nA: Compiling the `foonathan_lex_ctokenizer` target, which contains an implementation of a tokenizer for C (modulo some details),\ntakes under three seconds.\nJust including `\u003ciostream\u003e` takes about half a second, including `\u003ciostream\u003e` and `\u003cregex\u003e` takes about two seconds.\nSo the compile time is noticeable, but as a tokenizer will not be used in a lot of files of the project and rarely changes, acceptable.\n\nIn the future, I will probably look at optimizing it as well.\n\n**Q: My `lex::rule_token` doesn't seem to be matched?**\n\nA: This could be due to one of two things:\n\n* Multiple rule tokens would match the input. Then the tokenizer just picks the one that comes first.\n  Make sure that all rule tokens are mutually exclusive, maybe by using `lex::null_token` and creating them all in one place at necessary.\n  See `int_literal` and `float_literal` in the C tokenizer for an example.\n* A literal token is a prefix of the rule token, e.g. a C comment `/* … */` and the `/` operator are in conflict.\n  By default, the literal token is preferred in that case.\n  Implement `is_conflicting_literal()` in your rule token as done by the `comment` token in the C tokenizer.\n\nA mode to test for this issues is planned.\n\n**Q: The `lex::tokenizer` gives me just the next token, how do I implement lookahead for specific tokens?**\n\nA: Simple call `get()` until you've reached the token you want to lookahead, then `reset()` the tokenizer to the earlier position.\n\n**Q: How does it compare to [compile-time-regular-expressions](https://github.com/hanickadot/compile-time-regular-expressions)?**\n\nA: That project implements a RegEx parser at compile-time, which can be used to match strings.\nfoonathan/lex is project is purely designed to tokenize strings.\nYou could implement a tokenizer with the compile-time RegEx but I have choosen a different approach.\n\n**Q: How does it compare to [PEGTL](https://github.com/taocpp/PEGTL)?**\n\nA: That project implements matching parsing expression grammars (PEGs), which are a more powerful RegEx, basically.\nOn top of that they've implemented a parsing interface, so you can create a parse tree, for example.\nfoonathan/lex currently does just tokenization, but I plan on adding parse rules on top of the tokens later on.\nComplex tokens in foonathan/lex can be described using PEG as well, but here the PEGs are described using operator overloading and functions,\nand in PEGTL they are described by the type system.\n\n**Q: It breaks when I do this!**\n\nA: Don't do that. And file an issue (or a PR, I have a lot of other projects...).\n\n**Q: This is awesome!**\n\nA: Thanks. I do have a Patreon page, so consider checking it out:\n\n[![Patreon](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://patreon.com/foonathan)\n\n## Documentation\n\nTutorial and reference documentation can be found [here](doc/doc.md).\n\n### Compiler Support\n\nThe library requires a C++14 compiler with reasonable `constexpr` support.\nCompilers that are being tested on CI:\n\n* Linux:\n    * GCC 5 to 8, but compile-time parsing is not supported for GCC \u003c 8 (still works at runtime)\n    * clang 4 to 7\n* MacOS:\n    * XCode 9 and 10\n* Windows:\n    * Visual Studio 2017, but compile-time parsing sometimes doesn't work (still works at runtime)\n\n### Installation\n\nThe library is header-only and requires my [debug_assert](https://github.com/foonathan/debug_assert) library as well as the (header-only and standalone) [Boost.mp11](https://github.com/boostorg/mp11).\n\n#### Using CMake `add_subdirectory()`:\n\nDownload and call `add_subdirectory()`.\nIt will look for the dependencies with `find_package()`, if they're not found, the git submodules will be used.\n\nThen link to `foonathan::foonathan_lex`.\n\n#### Using CMake `find_package()`:\n\nDownload and install, setting the CMake variable `FOONATHAN_LEX_FORCE_FIND_PACKAGE=ON`.\nThis requires the dependencies to be installed as well.\n\nThen call `find_package(foonathan_lex)` and link to `foonathan::foonathan_lex`.\n\n##### With other buildsystems:\n\nYou need to set the following options:\n\n* Enable C++14\n* Add the include path, so `#include \u003cdebug_assert.hpp\u003e` works\n* Add the include path, so `#include \u003cboost/mp11/mp11.hpp\u003e` works\n* Add the include path, so `#include \u003cfoonathan/lex/tokenizer.hpp\u003e` works\n\n## Planned Features\n\n* Parser on top of the tokenizer\n* Integrated way to handle data associated with tokens (like the value of an integer literal)\n* Optimize compile-time\n\n","funding_links":["https://patreon.com/foonathan"],"categories":["C++"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoonathan%2Flex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoonathan%2Flex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoonathan%2Flex/lists"}