Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/susji/mre
Toy regular expression library
https://github.com/susji/mre
regex regular-expression
Last synced: about 2 months ago
JSON representation
Toy regular expression library
- Host: GitHub
- URL: https://github.com/susji/mre
- Owner: susji
- License: gpl-3.0
- Created: 2021-11-20T11:49:16.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2021-11-22T17:31:14.000Z (about 3 years ago)
- Last Synced: 2024-06-21T00:05:19.856Z (8 months ago)
- Topics: regex, regular-expression
- Language: Go
- Homepage:
- Size: 31.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MRE
`MRE` is a simple regular expression library. It provides basic capabilities
for matching and extracting string contents. We assume that our input is
UTF-8-encoded.## Technical details
Our regular expressions support only regular languages, that is, we do not
support backreferences. All matchers are greedy, that is, there is no `?`
suffix.Subexpressions (`(..)`) imply capturing.
Ranges in set expressions are treated directly with their `uint32` codepoint
values.If a `regexp` does not begin with `^`, it will be evaluated as containing an
implicit `.*?` in the very beginning. Similarly, if `regexp` does not end with
`$`, it will understood as implicit `.*?` in the very end.By default, if any of the special characters are to be used for matching
literal runes outside bracketed expressions (sets, they must be escaped with
`\`. Runes within set expressions (`[..]`) are treated literally with the
exception of rune ranges (`-`) and negations (`^`) -- to match them literally,
place them accordingly in bracketed expressions. Otherwise set runes are
matched literally.XXX Add `]` like POSIX ERE to set matching, ie. for it to be matched as a rune,
it needs to be placed right after `[` or `[^`.We want alternation (`|`) to bind very loosely and thus we use the traditional
precedence-via-nonterminal-levels approach. The grammar below only deals with
parsing and for this reason escapes are not included as they are treated in the
lexing phase.Our grammar is roughly the following:
```ebnf
regexp = [ "^" ], { or-expr, } [ "$" ]
or-expr = atoms, { "|", atoms }
atoms = { atom, [ times ] }
atom = subexpr
| set
| "."
| rune
subexpr = "(", expr, ")"
set = "[", { "^" }, { rune, [ "-", rune ] }, "]"
times = "+"
| "*"
| "?"
| "{", posnum, "}"
| "{", posnum, ",", [ posnum ], "}"
posnum = "0" | digit, { digit }
digit = "0" | ... | "9"
rune = any-unicode-codepoint
```