https://github.com/hanickadot/compile-time-regular-expressions
Compile Time Regular Expression in C++
https://github.com/hanickadot/compile-time-regular-expressions
awesome compile-time constexpr cpp17 cpp20 header-only pcre regular-expression template-udl
Last synced: 8 months ago
JSON representation
Compile Time Regular Expression in C++
- Host: GitHub
- URL: https://github.com/hanickadot/compile-time-regular-expressions
- Owner: hanickadot
- License: apache-2.0
- Created: 2016-06-25T23:17:18.000Z (almost 10 years ago)
- Default Branch: main
- Last Pushed: 2024-09-21T17:36:49.000Z (over 1 year ago)
- Last Synced: 2024-09-21T17:53:13.860Z (over 1 year ago)
- Topics: awesome, compile-time, constexpr, cpp17, cpp20, header-only, pcre, regular-expression, template-udl
- Language: C++
- Homepage: https://twitter.com/hankadusikova
- Size: 2.42 MB
- Stars: 3,305
- Watchers: 67
- Forks: 181
- Open Issues: 99
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- fucking-awesome-cpp - CTRE - A Compile time PCRE (almost) compatible regular expression matcher. [MIT] (Regular Expression)
- stars - hanickadot/compile-time-regular-expressions
- StarryDivineSky - hanickadot/compile-time-regular-expressions - time-regular-expressions 是一个基于 C++ 的编译时正则表达式处理库,其核心目标是通过在编译阶段完成正则表达式的解析与验证,从而避免传统运行时正则表达式可能引发的错误和性能损耗。项目采用 C++11 及以上版本开发,利用模板元编程和编译时计算技术,将正则表达式转换为高效的匹配逻辑,例如将模式编译为状态机或抽象语法树,使运行时匹配过程无需额外解析步骤,显著提升性能并减少潜在的运行时异常。其关键特性包括:1)编译时语法校验,确保正则表达式格式正确;2)类型安全设计,通过模板参数化避免运行时类型错误;3)零运行时开销,匹配过程直接调用编译生成的代码;4)支持主流正则表达式语法(如捕获组、量词等)。开发者可通过宏或编译器指令将正则表达式嵌入代码中,编译器会自动生成匹配逻辑,适用于需要高性能字符串处理的场景(如协议解析、日志过滤等)。项目无需外部依赖,采用 MIT 许可证,适合集成到需要强类型和高效率的 C++ 项目中。 (C/C++程序设计 / 资源传输下载)
- awesome-cpp-cn - CTRE
- awesome-hpp - compile-time regular expressions - time-regular-expressions?style=social)](https://github.com/hanickadot/compile-time-regular-expressions/stargazers/) | A Compile time regular expression matcher. | [](https://opensource.org/licenses/Apache-2.0) | (Regular Expression)
- awesome-cpp - CTRE - A Compile time PCRE (almost) compatible regular expression matcher. [MIT] (Regular Expression)
- awesome-cpp - CTRE - A Compile time PCRE (almost) compatible regular expression matcher. [MIT] (Regular Expression)
- awesome-list - compile-time-regular-expressions
- awesome-cpp-with-stars - CTRE - 09-12 | (Regular Expression)
README
# Compile Time Regular Expressions v3
[](https://github.com/hanickadot/compile-time-regular-expressions/actions/workflows/tests.yml)
Fast compile-time regular expressions with support for matching/searching/capturing during compile-time or runtime.
You can use the single header version from directory `single-header`. This header can be regenerated with `make single-header`. If you are using cmake, you can add this directory as subdirectory and link to target `ctre`.
More info at [compile-time.re](https://compile-time.re/)
## What this library can do
```c++
ctre::match<"REGEX">(subject); // C++20
"REGEX"_ctre.match(subject); // C++17 + N3599 extension
```
* Matching
* Searching (`search` or `starts_with`)
* Capturing content (named captures are supported too, but only with syntax `(?...)`)
* Back-Reference (`\g{N}` syntax, and `\1...\9` syntax too)
* Multiline support (with `multiline_`) functions
* Unicode properties and UTF-8 support
The library is implementing most of the PCRE syntax with a few exceptions:
* callouts
* comments
* conditional patterns
* control characters (`\cX`)
* match point reset (`\K`)
* named characters
* octal numbers
* options / modes
* subroutines
* unicode grapheme cluster (`\X`)
More documentation on [pcre.org](https://www.pcre.org/current/doc/html/pcre2syntax.html).
### Unknown character escape behaviour
Not all escaped characters are automatically inserted as self, behaviour of the library is escaped characters are with special meaning, unknown escaped character is a syntax error.
Explicitly allowed character escapes which insert only the character are:
```\-\"\<\>```
## Basic API
This is approximated API specification from a user perspective (omitting `constexpr` and `noexcept` which are everywhere, and using C++20 syntax even the API is C++17 compatible):
```c++
// look if whole input matches the regex:
template auto ctre::match(auto Range &&) -> regex_results;
template auto ctre::match(auto First &&, auto Last &&) -> regex_results;
// look if input contains match somewhere inside of itself:
template auto ctre::search(auto Range &&) -> regex_results;
template auto ctre::search(auto First &&, auto Last &&) -> regex_results;
// check if input starts with match (but doesn't need to match everything):
template auto ctre::starts_with(auto Range &&) -> regex_results;
template auto ctre::starts_with(auto First &&, auto Last &&) -> regex_results;
// result type is deconstructible into a structured bindings
template <...> struct regex_results {
operator bool() const; // if it's a match
auto to_view() const -> std::string_view; // also view()
auto to_string() const -> std::string; // also str()
operator std::string_view() const; // also supports all char variants
explicit operator std::string() const;
// also size(), begin(), end(), data()
size_t count() const; // number of captures
template const captured_content & get() const; // provide specific capture, whole regex_results is implicit capture 0
};
```
### Range outputting API
```c++
// search for regex in input and return each occurrence, ignoring rest:
template auto ctre::range(auto Range &&) -> range of regex_result;
template auto ctre::range(auto First &&, auto Last &&) -> range of regex_result;
// return range of each match, stopping at something which can't be matched
template auto ctre::tokenize(auto Range &&) -> range of regex_result;
template auto ctre::tokenize(auto First &&, auto Last &&) -> range of regex_result;
// return parts of the input split by the regex, returning it as part of content of the implicit zero capture (other captures are not changed, you can use it to access how the values were split):
template auto ctre::split(auto Range &&) -> regex_result;
template auto ctre::split(auto First &&, auto Last &&) -> range of regex_result;
```
### Functors
All the functions (`ctre::match`, `ctre::search`, `ctre::starts_with`, `ctre::range`, `ctre::tokenize`, `ctre::split`) are functors and can be used without parenthesis:
```c++
auto matcher = ctre::match<"regex">;
if (matcher(input)) ...
```
### Possible subjects (inputs)
* `std::string`-like objects (`std::string_view` or your own string if it's providing `begin`/`end` functions with forward iterators)
* pairs of forward iterators
### Unicode support
To enable you need to include:
* ``
* or `` and ``
Otherwise you will get missing symbols if you try to use the unicode support without enabling it.
## Supported compilers
* clang 14.0+ (template UDL, C++17 syntax, C++20 cNTTP syntax)
* xcode clang 15.0+ (template UDL, C++17 syntax, C++20 cNTTP syntax)
* gcc 9.0+ (C++17 & C++20 cNTTP syntax)
* MSVC 14.29+ (Visual Studio 16.11+) (C++20 cNTTP syntax)
### Template UDL syntax
The compiler must support extension N3599, for example as GNU extension in gcc (not in GCC 9.1+) and clang.
```c++
constexpr auto match(std::string_view sv) noexcept {
using namespace ctre::literals;
return "h.*"_ctre.match(sv);
}
```
If you need extension N3599 in GCC 9.1+, you can't use -pedantic. Also, you need to define macro `CTRE_ENABLE_LITERALS`.
### C++17 syntax
You can provide a pattern as a `constexpr ctll::fixed_string` variable.
```c++
static constexpr auto pattern = ctll::fixed_string{ "h.*" };
constexpr auto match(std::string_view sv) noexcept {
return ctre::match(sv);
}
```
(this is tested in MSVC 15.8.8)
### C++20 syntax
Currently, the only compiler which supports cNTTP syntax `ctre::match(subject)` is GCC 9+.
```c++
constexpr auto match(std::string_view sv) noexcept {
return ctre::match<"h.*">(sv);
}
```
## Examples
### Extracting number from input
```c++
std::optional extract_number(std::string_view s) noexcept {
if (auto m = ctre::match<"[a-z]+([0-9]+)">(s)) {
return m.get<1>().to_view();
} else {
return std::nullopt;
}
}
```
[link to compiler explorer](https://gcc.godbolt.org/z/5U67_e)
### Extracting values from date
```c++
struct date { std::string_view year; std::string_view month; std::string_view day; };
std::optional extract_date(std::string_view s) noexcept {
using namespace ctre::literals;
if (auto [whole, year, month, day] = ctre::match<"(\\d{4})/(\\d{1,2})/(\\d{1,2})">(s); whole) {
return date{year, month, day};
} else {
return std::nullopt;
}
}
// static_assert(extract_date("2018/08/27"sv).has_value());
// static_assert((*extract_date("2018/08/27"sv)).year == "2018"sv);
// static_assert((*extract_date("2018/08/27"sv)).month == "08"sv);
// static_assert((*extract_date("2018/08/27"sv)).day == "27"sv);
```
[link to compiler explorer](https://gcc.godbolt.org/z/x64CVp)
### Using captures
```c++
auto result = ctre::match<"(?\\d{4})/(?\\d{1,2})/(?\\d{1,2})">(s);
return date{result.get<"year">(), result.get<"month">, result.get<"day">};
// or in C++ emulation, but the object must have a linkage
static constexpr ctll::fixed_string year = "year";
static constexpr ctll::fixed_string month = "month";
static constexpr ctll::fixed_string day = "day";
return date{result.get(), result.get(), result.get()};
// or use numbered access
// capture 0 is the whole match
return date{result.get<1>(), result.get<2>(), result.get<3>()};
```
### Lexer
```c++
enum class type {
unknown, identifier, number
};
struct lex_item {
type t;
std::string_view c;
};
std::optional lexer(std::string_view v) noexcept {
if (auto [m,id,num] = ctre::match<"([a-z]+)|([0-9]+)">(v); m) {
if (id) {
return lex_item{type::identifier, id};
} else if (num) {
return lex_item{type::number, num};
}
}
return std::nullopt;
}
```
[link to compiler explorer](https://gcc.godbolt.org/z/PKTiCC)
### Range over input
This support is preliminary, probably the API will be changed.
```c++
auto input = "123,456,768"sv;
for (auto match: ctre::search_all<"([0-9]+),?">(input))
std::cout << std::string_view{match.get<0>()} << "\n";
```
### Unicode
```c++
#include
#include
// needed if you want to output to the terminal
std::string_view cast_from_unicode(std::u8string_view input) noexcept {
return std::string_view(reinterpret_cast(input.data()), input.size());
}
int main() {
using namespace std::literals;
std::u8string_view original = u8"Tu es un génie"sv;
for (auto match: ctre::search_all<"\\p{Letter}+">(original))
std::cout << cast_from_unicode(match) << std::endl;
return 0;
}
```
[link to compiler explorer](https://godbolt.org/z/erTshe6sz)
## Installing ctre using vcpkg
You can download and install ctre using the [vcpkg](https://github.com/Microsoft/vcpkg) dependency manager:
```bash
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install ctre
```
The ctre port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
## Running tests (for developers)
Just run `make` in root of this project.