An open API service indexing awesome lists of open source software.

https://github.com/gruhn/regex-utils

TypeScript library for regex intersection, complement and other utilities that go beyond string matching.
https://github.com/gruhn/regex-utils

javascript regex regexp regular-expression regular-expressions typescript

Last synced: 2 months ago
JSON representation

TypeScript library for regex intersection, complement and other utilities that go beyond string matching.

Awesome Lists containing this project

README

          

# Regex Utils

Zero-dependency TypeScript library for regex utilities that go beyond string matching.
These are surprisingly hard to come by for any programming language. ✨

- [Documentation](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html)
- Online demos:
- [RegExp Equivalence Checker](https://gruhn.github.io/regex-utils/equiv-checker.html)
- [Random Password Generator](https://gruhn.github.io/regex-utils/password-generator.html)

## API Overview 🚀

- 🔗 Set-style operations:
- [.and(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#and) - Compute intersection of two regex.
- [.not()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#not) - Compute the complement of a regex.
- [.without(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#without) - Compute the difference of two regex.
- ✅ Set-style predicates:
- [.isEquivalent(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isEquivalent) - Check whether two regex match the same strings.
- [.isSubsetOf(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isSubsetOf)
- [.isSupersetOf(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isSupersetOf)
- [.isDisjointFrom(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isDisjointFrom)
- [.isEmpty()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#isEmpty) - Check whether a regex matches no strings.
- 📜 Generate strings:
- [.sample(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#sample) - Generate random strings matching a regex.
- [.enumerate()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#enumerate) - Exhaustively enumerate strings matching a regex.
- 🔧 Miscellaneous:
- [.size()](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#size) - Count the number of strings that a regex matches.
- [.derivative(...)](https://gruhn.github.io/regex-utils/interfaces/RegexBuilder.html#derivative) - Compute a Brzozowski derivative of a regex.
- and others...

## Installation 📦

```bash
npm install @gruhn/regex-utils
```
```typescript
import { RB } from '@gruhn/regex-utils'
```

## Syntax Support

| Feature | Support | Examples |
|---------|---------|-------------|
| Quantifiers | ✅ | `a*`, `a+`, `a{3,10}`, `a?` |
| Alternation | ✅ | `a\|b` |
| Character classes | ✅ | `.`, `\w`, `[a-zA-Z]`, ... |
| Escaping | ✅ | `\$`, `\.`, ... |
| (Non-)capturing groups | ✅1 | `(?:...)`, `(...)` |
| Start/end anchors | ⚠️2 | `^`, `$` |
| Lookahead | ⚠️3 | `(?=...)`, `(?!...)` |
| Lookbehind | ❌ | `(?<=...)`, `(?4 | `/.../g` |
| `hasIndices` flag | ✅4 | `/.../d` |
| `ignoreCase` flag | ❌ | `/.../i` `(?i:...)` |
| `multiline` flag | ❌ | `/.../m` `(?m:...)` |
| `unicode` flag | ❌ | `/.../u` |
| `unicodeSets` flag | ❌ | `/.../v` |
| `sticky` flag | ❌ | `/.../y` |

1. Both capturing- and non-capturing groups are just treated as parenthesis, because this library is never doing string extraction.
2. Some pathological patterns are not supported like anchors inside quantifiers `(^a)+`.
3. Anchors inside lookaheads like `(?=^a)` are not supported.
4. Flag is simply ignored because it does not affect the behavior of this library.

An `UnsupportedSyntaxError` is thrown when unsupported patterns are detected.
The library **SHOULD ALWAYS** either throw an error or respect the regex specification exactly.
Please report a bug if the library silently uses a faulty interpretation.

Handling syntax-related errors:
```typescript
import { RB, ParseError, UnsupportedSyntaxError } from '@gruhn/regex-utils'

try {
RB(/^[a-z]*$/)
} catch (error) {
if (error instanceof SyntaxError) {
// Invalid regex syntax! Native error, not emitted by this library.
// E.g. this will also throw a `SyntaxError`: new RegExp(')')
} else if (error instanceof ParseError) {
// The regex syntax is valid but the internal parser could not handle it.
// If this happens it's a bug in this library.
} else if (error instanceof UnsupportedSyntaxError) {
// Regex syntax is valid but not supported by this library.
}
}
```

## Example use cases 💡

### Generate test data from regex 📜

Generate 5 random email addresses:
```typescript
const email = RB(/^[a-z]+@[a-z]+\.[a-z]{2,3}$/)
for (const str of email.sample().take(5)) {
console.log(str)
}
```
```
ky@e.no
cc@gg.gaj
z@if.ojk
vr@y.ehl
e@zx.hzq
```

Generate 5 random email addresses, which have exactly 20 characters:
```typescript
const emailLength20 = email.and(/^.{20}$/)
for (const str of emailLength20.sample().take(5)) {
console.log(str)
}
```
```
kahragjijttzyze@i.mv
gnpbjzll@cwoktvw.hhd
knqmyotxxblh@yip.ccc
kopfpstjlnbq@lal.nmi
vrskllsvblqb@gemi.wc
```

### Refactor regex then check equivalence 🔄

[**ONLINE DEMO**](https://gruhn.github.io/regex-utils/equiv-checker.html?regexp1=%5Ea%7Cb%24&regexp2=%5E%5Bab%5D%24)

Say we found this incredibly complicated regex somewhere in the codebase:
```typescript
const oldRegex = /^a|b$/
```

This can be simplified, right?
```typescript
const newRegex = /^[ab]$/
```

But to double-check we can use `.isEquivalent` to verify that the new version matches exactly the same strings as the old version.
That is, whether `oldRegex.test(str) === newRegex.test(str)` for every possible input string:

```typescript
RB(oldRegex).isEquivalent(newRegex) // false
```

Looks like we made some mistake.
We can generate counterexamples using `.without(...)` and `.sample(...)`.
First, we derive new regex that match exactly what `newRegex` matches but not `oldRegex` and vice versa:
```typescript
const onlyNew = RB(newRegex).without(oldRegex)
const onlyOld = RB(oldRegex).without(newRegex)
```
`onlyNew` turns out to be empty (`onlyNew.isEmpty() === true`) but `onlyOld` has some matches:
```typescript
for (const str of onlyOld.sample().take(5)) {
console.log(str)
}
```
```
aaba
aa
aba
bab
aababa
```
Why does `oldRegex` match all these strings with multiple characters?
Shouldn't it only match "a" or "b" like `newRegex`?
Turns out we thought that `oldRegex` is the same as `^(a|b)$`
but in reality it's the same as `(^a)|(b$)`.

### Comment regex using complement 💬

How do you write a regex that matches HTML comments like:
```

```
A straightforward attempt would be:
```typescript

```
The problem is that `.*` also matches the end marker `-->`,
so this is also a match:
```typescript
and this shouldn't be part of it -->
```
We need to specify that the inner part can be any string that does not contain `-->`.
With `.not()` (aka. regex complement) this is easy:

```typescript
import { RB } from '@gruhn/regex-utils'

const commentStart = RB('.*$/).not()
const commentEnd = RB('-->')

const comment = commentStart.concat(commentInner).concat(commentEnd)
```

With `.toRegExp()` we can convert back to a native JavaScript regex:
```typescript
comment.toRegExp()
```
```
/^