Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MahBestBro/regex
A single file regex library written in and for Zig.
https://github.com/MahBestBro/regex
Last synced: 4 months ago
JSON representation
A single file regex library written in and for Zig.
- Host: GitHub
- URL: https://github.com/MahBestBro/regex
- Owner: MahBestBro
- Created: 2023-03-19T01:40:23.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-04-12T03:47:50.000Z (over 1 year ago)
- Last Synced: 2024-05-13T00:07:37.868Z (7 months ago)
- Language: Zig
- Size: 107 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-zig - MahBestBro/regex
README
# zig-regex
A single file regex library written in and for Zig.**Note:** This library is still in development. For now I would recommend using another library if you are looking for something robust.
This library was mainly inspired by the rough implementation of Ken Thompson's algorithm outlined in [this article](https://swtch.com/~rsc/regexp/regexp1.html) by Russ Cox, though not everything is based off it.## Getting Started
Just download the `regex.zig` file and include it into your project however you want.## Example Usage
```zig
const std = @import("std");
const regex = @import("regex.zig");pub fn main() !void
{
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena.deinit();const rx = try regex.Regex.compile("ab*c", arena.allocator());
defer rx.deinit();
if (rx.match("abc"))
{
std.debug.print("Hooray!\n", .{});
}
else
{
std.debug.print("Uh oh...\n", .{});
}
}
```## Notation
Like all regular expressions, non-operator characters which are next to each other concatenate (so `abc` would match "abc"). Brackets are denoted by normal parenthesis (i.e., `()`).The following regex operators are supported as of current:
* `|` - If `R` and `S` are regular expressions, then `R|S` matches `R` or `S` (Note: This has the highest precedence, so `a|bc*` is equivalent to `a|(bc*)`, not `(a|b)c*`).
* `*` - If `R` is a regular expression, then `R*` matches 0 or more repetitions of `R`.
* `+` - Same as `*` but 1 or more repetitions.
* `?` - If `R` is a regular expression, then `R?` matches 1 or no appearances if `R`The following character classes are supported as of current
* `.` - represents any character.
* `[]` - represents any of the characters inside these brackets (e.g., `[abc]` would mean 'a', 'b' or 'c').
* `-` - use inside square brackets to denote a range of characters (e.g., `[a-z]` would mean any character from 'a' to 'z').`/` is an escape character, it can be used to escape any character that would normally represent an operator or character class* (e.g., `/*` would match "*"), and can also represent the following control codes:
* `/n` - recognises new line ascii character.
* `/r` - recognises return carriage ascii character.
* `/t` - recognises tab ascii character.*Note: What requires escaping depends one whether you're inside `[]` or not. For example,
`*` does not need escaping when inside `[]`, but `-` does.## Features to be added
* Substring matching## Potential future features
* Subexpressions (probably with `{}`).
* UTF8 support
* Any other operators I can think of or remember that are commonly used or are useful.