Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pwwang/regexr
Regular expressions for humans
https://github.com/pwwang/regexr
regex regular-expression regular-expressions
Last synced: 14 days ago
JSON representation
Regular expressions for humans
- Host: GitHub
- URL: https://github.com/pwwang/regexr
- Owner: pwwang
- Created: 2022-07-23T00:57:50.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-07-26T05:11:50.000Z (over 2 years ago)
- Last Synced: 2024-12-10T04:27:12.107Z (24 days ago)
- Topics: regex, regular-expression, regular-expressions
- Language: Python
- Homepage: https://pwwang.github.io/regexr
- Size: 499 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# regexr
Regular expressions for humans
Instead of writing a regular expression to match an URL:
```python
# need to be compiled with re.X
regex = r'''
^(?Phttp|https|ftp|mailto|file|data|irc)://
(?P[A-Za-z0-9-]{0,63}(?:\.[A-Za-z0-9-]{0,63})+)
(?::(?P\d{1,4}))?
(?P/*(?:/*[A-Za-z0-9\-._]+/*)*)
(?:\?(?P.*?))?
(?:\#(?P.*))?$
'''
```You can write:
```python
regexr = Regexr(
START,
## match the protocol
Or('http', 'https', 'ftp', 'mailto', 'file', 'data', 'irc', capture="protocol"),
'://',
## match the domain
Capture(
Repeat(OneOfChars('A-Z', 'a-z', '0-9', '-'), m=0, n=63),
OneOrMore(DOT, Repeat(OneOfChars('A-Z', 'a-z', '0-9', '-'), m=0, n=63)),
name="domain",
),
## match the port
Maybe(':', Capture(Repeat(DIGIT, m=1, n=4), name="port")),
## match the path
Capture(
ZeroOrMore('/'),
ZeroOrMore(
ZeroOrMore('/'),
OneOrMore(OneOfChars('A-Z', 'a-z', '0-9', r'\-._')),
ZeroOrMore('/'),
),
name="path",
),
## match the query
Maybe("?", Capture(Lazy(MAYBE_ANYCHARS), name="query")),
## and finally the fragment
Maybe("#", Capture(MAYBE_ANYCHARS, name="fragment")),
END,
)
```Inspired by [rex](https://github.com/r-lib/rex) for R and [Regularity](https://github.com/andrewberls/regularity) for Ruby.
## Why?
We have `re.X` to compile a regular expression in verbose mode, but sometimes it is still difficult to read/write and error-prone.
- Easy to read/write regular expressions
- For example, `[]]` might need a second to understand it. But we can write it as `OneOfChars("]")` and it will be easier to read.
- Easy to write regular expressions with autocompletions from IDEs
- When we write raw regex, we can't get any hints from IDEs
- Non-capturing for groups whether possible
- For example, with `Maybe(Maybe("a", "b))` we get `(?:(?:ab)?)?`
- Easy to avoid unintentional errors
- For example, sometimes it's difficult to debug with `r"(?P>\d+)\D+\a` when we accidentally put one more `>` after the capturing name.
- Easy to avoid ambiguity
- For example, `?` could be a quantifier meaning `0` or `1` match. It could also be a non-greedy (lazy) modifier for quantifiers. It's easy to be distinguished by `Maybe(...)` and `Lazy(...)` (or quantifiers with `lazy=True`).
- Easily avoid unbalanced parentheses/brackets/braces
- Especially when we want to match them. For example, `Capture("(")` instead of `(\()`.
## Usage
### More examples- Matching a phone number like `XXX-XXX-XXXX` or `(XXX) XXX XXXX`
```python
Regexr(
START,
# match the first part
Maybe(Capture('(', name="open_paren")),
RepeatExact(DIGIT, m=3),
Conditional("open_paren", yes=")"),Maybe(OneOfChars('- ')),
# match the second part
RepeatExact(DIGIT, m=3),Maybe(OneOfChars('- ')),
# match the third part
RepeatExact(DIGIT, m=4),
END,
)# compiles to
# ^(?P\()?\d{3}(?(open_paren)\))[- ]?\d{3}[- ]?\d{4}$
```- Matching an IP address
```python
# Define the pattern for one part of xxx.xxx.xxx.xxx
ip_part = Or(
# Use Concat instead of NonCapture to avoid brackets
# 250-255
Concat("25", OneOfChars('0-5')),
# 200-249
Concat("2", OneOfChars('0-4'), DIGIT),
# 000-199
Concat(Or("0", "1"), RepeatExact(DIGIT, m=2)),
# 00-99
Repeat(DIGIT, m=1, n=2),
)Regexr(
START,
ip_part,
RepeatExact(DOT, ip_part, m=3),
END,
)
# compiles to
# ^(?:25[0-5]|2[0-4]\d|(?:0|1)\d{2}|\d{1,2})(?:\.(?:25[0-5]|2[0-4]\d|(?:0|1)\d{2}|\d{1,2})){3}$
```- Matching an HTML tag roughly (without attributes)
```python
Regexr(
START,
"<", Capture(WORDS, name="tag"), ">",
Lazy(ANYCHARS),
"", Captured("tag"), ">",
END,
)
# compiles to
# ^<(?P\w+)>.+?(?P=tag)>$
```### Pretty print a `Regexr` object
With the example at the very beginning (matching an URL), we can pretty print it:
```
# print(regexr.pretty())
# prints:^
(?Phttp|https|ftp|mailto|file|data|irc)
://
(?P
[A-Za-z0-9-]{0,63}
(?:\.[A-Za-z0-9-]{0,63})+
)
(?::(?P\d{1,4}))?
(?P
/*
(?:/*[A-Za-z0-9\-._]+/*)*
)
(?:\?(?P.*?))?
(?:\#(?P.*))?
$
```### Compile a `Regexr` directly
```python
Regexr("a").compile(re.I).match("A")
#
```## API documentation
## TODO
- Support bytes