https://github.com/gruhn/regex-utils
TypeScript library for regex intersection, complement and other utilities that go beyond string matching.
https://github.com/gruhn/regex-utils
javascript regex regexp regular-expression regular-expressions typescript
Last synced: 2 months ago
JSON representation
TypeScript library for regex intersection, complement and other utilities that go beyond string matching.
- Host: GitHub
- URL: https://github.com/gruhn/regex-utils
- Owner: gruhn
- Created: 2025-01-19T10:28:22.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-05-28T21:50:41.000Z (10 months ago)
- Last Synced: 2025-06-16T10:58:23.315Z (9 months ago)
- Topics: javascript, regex, regexp, regular-expression, regular-expressions, typescript
- Language: TypeScript
- Homepage: https://gruhn.github.io/regex-utils/
- Size: 296 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-regex - regex-utils - Check regex equivalence, build regex intersections, and other utilities. (JavaScript regex libraries / Regex processors, utilities, and more)
README
# Regex Utils
Zero-dependency TypeScript library for regex utilities that go beyond string matching.
These are surprisingly hard to come by for any programming language. ✨
- [Documentation](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html)
- Online demos:
- [RegExp Equivalence Checker](https://gruhn.github.io/regex-utils/equiv-checker.html)
- [Random Password Generator](https://gruhn.github.io/regex-utils/password-generator.html)
## API Overview 🚀
- 🔗 Set-style operations:
- [.and(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#and) - Compute intersection of two regex.
- [.not()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#not) - Compute the complement of a regex.
- [.without(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#without) - Compute the difference of two regex.
- ✅ Set-style predicates:
- [.isEquivalent(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isEquivalent) - Check whether two regex match the same strings.
- [.isSubsetOf(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isSubsetOf)
- [.isSupersetOf(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isSupersetOf)
- [.isDisjointFrom(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isDisjointFrom)
- [.isEmpty()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isEmpty) - Check whether a regex matches no strings.
- 📜 Generate strings:
- [.sample(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#sample) - Generate random strings matching a regex.
- [.enumerate()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#enumerate) - Exhaustively enumerate strings matching a regex.
- 🔧 Miscellaneous:
- [.size()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#size) - Count the number of strings that a regex matches.
- [.derivative(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#derivative) - Compute a Brzozowski derivative of a regex.
- and others...
## Installation 📦
```bash
npm install @gruhn/regex-utils
```
```typescript
import { RB } from '@gruhn/regex-utils'
```
## Syntax Support
| Feature | Support | Examples |
|---------|---------|-------------|
| Quantifiers | ✅ | `a*`, `a+`, `a{3,10}`, `a?` |
| Alternation | ✅ | `a\|b` |
| Character classes | ✅ | `.`, `\w`, `[a-zA-Z]`, ... |
| Escaping | ✅ | `\$`, `\.`, ... |
| (Non-)capturing groups | ✅1 | `(?:...)`, `(...)` |
| Start/end anchors | ⚠️2 | `^`, `$` |
| Lookahead | ⚠️3 | `(?=...)`, `(?!...)` |
| Lookbehind | ❌ | `(?<=...)`, `(?4 | `/.../g` |
| `hasIndices` flag | ✅4 | `/.../d` |
| `ignoreCase` flag | ❌ | `/.../i` `(?i:...)` |
| `multiline` flag | ❌ | `/.../m` `(?m:...)` |
| `unicode` flag | ❌ | `/.../u` |
| `unicodeSets` flag | ❌ | `/.../v` |
| `sticky` flag | ❌ | `/.../y` |
1. Both capturing- and non-capturing groups are just treated as parenthesis, because this library is never doing string extraction.
2. Some pathological patterns are not supported like anchors inside quantifiers `(^a)+`.
3. Anchors inside lookaheads like `(?=^a)` are not supported.
4. Flag is simply ignored because it does not affect the behavior of this library.
An `UnsupportedSyntaxError` is thrown when unsupported patterns are detected.
The library **SHOULD ALWAYS** either throw an error or respect the regex specification exactly.
Please report a bug if the library silently uses a faulty interpretation.
Handling syntax-related errors:
```typescript
import { RB, ParseError, UnsupportedSyntaxError } from '@gruhn/regex-utils'
try {
RB(/^[a-z]*$/)
} catch (error) {
if (error instanceof SyntaxError) {
// Invalid regex syntax! Native error, not emitted by this library.
// E.g. this will also throw a `SyntaxError`: new RegExp(')')
} else if (error instanceof ParseError) {
// The regex syntax is valid but the internal parser could not handle it.
// If this happens it's a bug in this library.
} else if (error instanceof UnsupportedSyntaxError) {
// Regex syntax is valid but not supported by this library.
}
}
```
## Example use cases 💡
### Generate test data from regex 📜
Generate 5 random email addresses:
```typescript
const email = RB(/^[a-z]+@[a-z]+\.[a-z]{2,3}$/)
for (const str of email.sample().take(5)) {
console.log(str)
}
```
```
ky@e.no
cc@gg.gaj
z@if.ojk
vr@y.ehl
e@zx.hzq
```
Generate 5 random email addresses, which have exactly 20 characters:
```typescript
const emailLength20 = email.and(/^.{20}$/)
for (const str of emailLength20.sample().take(5)) {
console.log(str)
}
```
```
kahragjijttzyze@i.mv
gnpbjzll@cwoktvw.hhd
knqmyotxxblh@yip.ccc
kopfpstjlnbq@lal.nmi
vrskllsvblqb@gemi.wc
```
### Refactor regex then check equivalence 🔄
[**ONLINE DEMO**](https://gruhn.github.io/regex-utils/equiv-checker.html?regexp1=%5Ea%7Cb%24®exp2=%5E%5Bab%5D%24)
Say we found this incredibly complicated regex somewhere in the codebase:
```typescript
const oldRegex = /^a|b$/
```
This can be simplified, right?
```typescript
const newRegex = /^[ab]$/
```
But to double-check we can use `.isEquivalent` to verify that the new version matches exactly the same strings as the old version.
That is, whether `oldRegex.test(str) === newRegex.test(str)` for every possible input string:
```typescript
RB(oldRegex).isEquivalent(newRegex) // false
```
Looks like we made some mistake.
We can generate counterexamples using `.without(...)` and `.sample(...)`.
First, we derive new regex that match exactly what `newRegex` matches but not `oldRegex` and vice versa:
```typescript
const onlyNew = RB(newRegex).without(oldRegex)
const onlyOld = RB(oldRegex).without(newRegex)
```
`onlyNew` turns out to be empty (`onlyNew.isEmpty() === true`) but `onlyOld` has some matches:
```typescript
for (const str of onlyOld.sample().take(5)) {
console.log(str)
}
```
```
aaba
aa
aba
bab
aababa
```
Why does `oldRegex` match all these strings with multiple characters?
Shouldn't it only match "a" or "b" like `newRegex`?
Turns out we thought that `oldRegex` is the same as `^(a|b)$`
but in reality it's the same as `(^a)|(b$)`.
### Comment regex using complement 💬
How do you write a regex that matches HTML comments like:
```
```
A straightforward attempt would be:
```typescript
```
The problem is that `.*` also matches the end marker `-->`,
so this is also a match:
```typescript
and this shouldn't be part of it -->
```
We need to specify that the inner part can be any string that does not contain `-->`.
With `.not()` (aka. regex complement) this is easy:
```typescript
import { RB } from '@gruhn/regex-utils'
const commentStart = RB('.*$/).not()
const commentEnd = RB('-->')
const comment = commentStart.concat(commentInner).concat(commentEnd)
```
With `.toRegExp()` we can convert back to a native JavaScript regex:
```typescript
comment.toRegExp()
```
```
/^