Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/francisrstokes/super-expressive
🦜 Super Expressive is a zero-dependency JavaScript library for building regular expressions in (almost) natural language
https://github.com/francisrstokes/super-expressive
Last synced: 30 days ago
JSON representation
🦜 Super Expressive is a zero-dependency JavaScript library for building regular expressions in (almost) natural language
- Host: GitHub
- URL: https://github.com/francisrstokes/super-expressive
- Owner: francisrstokes
- License: mit
- Created: 2020-07-14T22:04:33.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-05T06:32:31.000Z (4 months ago)
- Last Synced: 2024-10-01T13:01:18.400Z (about 1 month ago)
- Language: JavaScript
- Homepage:
- Size: 493 KB
- Stars: 4,617
- Watchers: 37
- Forks: 137
- Open Issues: 10
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
- awesome-list - super-expressive - dependency JavaScript library for building regular expressions in (almost) natural language | francisrstokes | 4072 | (JavaScript)
- jimsghstars - francisrstokes/super-expressive - 🦜 Super Expressive is a zero-dependency JavaScript library for building regular expressions in (almost) natural language (JavaScript)
README
# Super Expressive
![Super Expressive Logo](./logo.png)
**Super Expressive** is a JavaScript library that allows you to build regular expressions in almost natural language - with no extra dependencies, and a lightweight code footprint (less than 4kb with minification + gzip!).
---
- [Why](#Why)
- [Installation and Usage](#Installation-and-Usage)
- [Example](#Example)
- [Playground](#Playground)
- [Ports](#Ports)
- [API](#API)
Click to expand- [SuperExpressive()](#SuperExpressive)
- [.allowMultipleMatches](#allowMultipleMatches)
- [.lineByLine](#lineByLine)
- [.caseInsensitive](#caseInsensitive)
- [.generateIndices](#generateIndices)
- [.sticky](#sticky)
- [.unicode](#unicode)
- [.singleLine](#singleLine)
- [.anyChar](#anyChar)
- [.whitespaceChar](#whitespaceChar)
- [.nonWhitespaceChar](#nonWhitespaceChar)
- [.digit](#digit)
- [.nonDigit](#nonDigit)
- [.word](#word)
- [.nonWord](#nonWord)
- [.wordBoundary](#wordBoundary)
- [.nonWordBoundary](#nonWordBoundary)
- [.newline](#newline)
- [.carriageReturn](#carriageReturn)
- [.tab](#tab)
- [.verticalTab](#verticalTab)
- [.formFeed](#formFeed)
- [.backspace](#backspace)
- [.nullByte](#nullByte)
- [.anyOf](#anyOf)
- [.capture](#capture)
- [.namedCapture(name)](#namedCapturename)
- [.backreference(index)](#backreferenceindex)
- [.namedBackreference(index)](#namedBackreferenceindex)
- [.group](#group)
- [.end()](#end())
- [.assertAhead](#assertAhead)
- [.assertNotAhead](#assertNotAhead)
- [.assertBehind](#assertBehind)
- [.assertNotBehind](#assertNotBehind)
- [.optional](#optional)
- [.zeroOrMore](#zeroOrMore)
- [.zeroOrMoreLazy](#zeroOrMoreLazy)
- [.oneOrMore](#oneOrMore)
- [.oneOrMoreLazy](#oneOrMoreLazy)
- [.exactly(n)](#exactlyn)
- [.atLeast(n)](#atLeastn)
- [.atLeastLazy(n)](#atLeastLazyn)
- [.between(x, y)](#betweenx-y)
- [.betweenLazy(x, y)](#betweenLazyx-y)
- [.startOfInput](#startOfInput)
- [.endOfInput](#endOfInput)
- [.anyOfChars(chars)](#anyOfCharschars)
- [.anythingBut](#anythingBut)
- [.anythingButChars(chars)](#anythingButCharschars)
- [.anythingButString(str)](#anythingButStringstr)
- [.anythingButRange(a, b)](#anythingButRangea-b)
- [.string(s)](#strings)
- [.char(c)](#charc)
- [.controlChar(c)](#controlCharc)
- [.hexCode(hex)](#hexCodehex)
- [.utf16Code(hex)](#utf16Codehex)
- [.unicodeCharCode(hex)](#unicodeCharCodehex)
- [.unicodeProperty(property)](#unicodePropertyproperty)
- [.notUnicodeProperty(property)](#notUnicodePropertyproperty)
- [.range(a, b)](#rangea-b)
- [.subexpression(expr, opts)](#subexpressionexpr-opts)
- [.toRegexString()](#toRegexString)
- [.toRegex()](#toRegex)
## Why?
Regex is a very powerful tool, but its terse and cryptic vocabulary can make constructing and communicating them with others a challenge. Even developers who understand them well can have trouble reading their own back just a few months later! In addition, they can't be easily created and manipulated in a programmatic way - closing off an entire avenue of dynamic text processing.
That's where **Super Expressive** comes in. It provides a programmatic and human readable way to create regular expressions. It's API uses the [fluent builder pattern](https://en.wikipedia.org/wiki/Fluent_interface), and is completely immutable. It's built to be discoverable and predictable:
- properties and methods describe what they do in plain English
- order matters! quantifiers are specified before the thing they change, just like in English (e.g. `SuperExpressive().exactly(5).digit`)
- if you make a mistake, you'll know how to fix it. SuperExpressive will guide you towards a fix if your expression is invalid
- [subexpressions](#subexpressionexpr-opts) can be used to create meaningful, reusable components
- includes an `index.d.ts` file for full TypeScript supportSuperExpressive turns those complex and unwieldy regexes that appear in code reviews into something that can be read, understood, and **properly reviewed** by your peers - and maintained by anyone!
## Installation and Usage
```
npm i super-expressive
``````JavaScript
const SuperExpressive = require('super-expressive');// Or as an ES6 module
import SuperExpressive from 'super-expressive';
```## Example
The following example recognises and captures the value of a 16-bit hexadecimal number like `0xC0D3`.
```javascript
const SuperExpressive = require('super-expressive');const myRegex = SuperExpressive()
.startOfInput
.optional.string('0x')
.capture
.exactly(4).anyOf
.range('A', 'F')
.range('a', 'f')
.range('0', '9')
.end()
.end()
.endOfInput
.toRegex();// Produces the following regular expression:
/^(?:0x)?([A-Fa-f0-9]{4})$/
```## Playground
You can experiment with `SuperExpressive` in the [Super Expressive Playground](https://nartc.github.io/ng-super-expressive/) by [@nartc](https://github.com/nartc). This is a great way to build a regex description, and test it against various inputs.
## Ports
Super Expressive has been ported to the following languages:
### PHP
https://github.com/bassim/super-expressive-php by @bassim
### Ruby
https://github.com/hiy/super-expressive-ruby by @hiy
### Python
https://github.com/stanislav-tsaplev/super_expressive by @stanislav-tsaplev
## API
### SuperExpressive()
`SuperExpressive()`
Creates an instance of `SuperExpressive`.
### .allowMultipleMatches
Uses the `g` flag on the regular expression, which indicates that it should match multiple values when run on a string.
**Example**
```JavaScript
SuperExpressive()
.allowMultipleMatches
.string('hello')
.toRegex();
// ->
/hello/g
```### .lineByLine
Uses the `m` flag on the regular expression, which indicates that it should treat the [.startOfInput](#startOfInput) and [.endOfInput](#endOfInput) markers as the start and end of lines.
**Example**
```JavaScript
SuperExpressive()
.lineByLine
.string('^hello$')
.toRegex();
// ->
/\^hello\$/m
```### .caseInsensitive
Uses the `i` flag on the regular expression, which indicates that it should treat ignore the uppercase/lowercase distinction when matching.
**Example**
```JavaScript
SuperExpressive()
.caseInsensitive
.string('HELLO')
.toRegex();
// ->
/HELLO/i
```### .generateIndices
Uses the `d` flag on the regular expression, which indicates that it should generate indices for the start and end of each capture group.
**Example**
```JavaScript
SuperExpressive()
.generateIndices
.string('hello')
.toRegex();
// ->
/hello/d
```### .sticky
Uses the `y` flag on the regular expression, which indicates that it should create a stateful regular expression that can be resumed from the last match.
**Example**
```JavaScript
SuperExpressive()
.sticky
.string('hello')
.toRegex();
// ->
/hello/y
```### .unicode
Uses the `u` flag on the regular expression, which indicates that it should use full unicode matching.
**Example**
```JavaScript
SuperExpressive()
.unicode
.string('héllo')
.toRegex();
// ->
/héllo/u
```### .singleLine
Uses the `s` flag on the regular expression, which indicates that the input should be treated as a single line, where the [.startOfInput](#startOfInput) and [.endOfInput](#endOfInput) markers explicitly mark the start and end of input, and [.anyChar](#anyChar) also matches newlines.
**Example**
```JavaScript
SuperExpressive()
.singleLine
.string('hello')
.anyChar
.string('world')
.toRegex();
// ->
/hello.world/s
```### .anyChar
Matches any single character. When combined with [.singleLine](#singleLine), it also matches newlines.
**Example**
```JavaScript
SuperExpressive()
.anyChar
.toRegex();
// ->
/./
```### .whitespaceChar
Matches any whitespace character, including the special whitespace characters: `\r\n\t\f\v`.
**Example**
```JavaScript
SuperExpressive()
.whitespaceChar
.toRegex();
// ->
/\s/
```### .nonWhitespaceChar
Matches any non-whitespace character, excluding also the special whitespace characters: `\r\n\t\f\v`.
**Example**
```JavaScript
SuperExpressive()
.nonWhitespaceChar
.toRegex();
// ->
/\S/
```### .digit
Matches any digit from `0-9`.
**Example**
```JavaScript
SuperExpressive()
.digit
.toRegex();
// ->
/\d/
```### .nonDigit
Matches any non-digit.
**Example**
```JavaScript
SuperExpressive()
.nonDigit
.toRegex();
// ->
/\D/
```### .word
Matches any alpha-numeric (`a-z, A-Z, 0-9`) characters, as well as `_`.
**Example**
```JavaScript
SuperExpressive()
.word
.toRegex();
// ->
/\w/
```
### .nonWordMatches any non alpha-numeric (`a-z, A-Z, 0-9`) characters, excluding `_` as well.
**Example**
```JavaScript
SuperExpressive()
.nonWord
.toRegex();
// ->
/\W/
```### .wordBoundary
Matches (without consuming any characters) immediately between a character matched by [.word](#word) and a character not matched by [.word](#word) (in either order).
**Example**
```JavaScript
SuperExpressive()
.digit
.wordBoundary
.toRegex();
// ->
/\d\b/
```### .nonWordBoundary
Matches (without consuming any characters) at the position between two characters matched by [.word](#word).
**Example**
```JavaScript
SuperExpressive()
.digit
.nonWordBoundary
.toRegex();
// ->
/\d\B/
```### .newline
Matches a `\n` character.
**Example**
```JavaScript
SuperExpressive()
.newline
.toRegex();
// ->
/\n/
```### .carriageReturn
Matches a `\r` character.
**Example**
```JavaScript
SuperExpressive()
.carriageReturn
.toRegex();
// ->
/\r/
```### .tab
Matches a `\t` character.
**Example**
```JavaScript
SuperExpressive()
.tab
.toRegex();
// ->
/\t/
```### .verticalTab
Matches a `\v` character.
**Example**
```JavaScript
SuperExpressive()
.verticalTab
.toRegex();
// ->
/\v/
```### .formFeed
Matches a `\f` character.
**Example**
```JavaScript
SuperExpressive()
.formFeed
.toRegex();
// ->
/\f/
```### .backspace
Matches a `\b` character.
**Example**
```JavaScript
SuperExpressive()
.backspace
.toRegex();
// ->
/[\b]/
```### .nullByte
Matches a `\u0000` character (ASCII `0`).
**Example**
```JavaScript
SuperExpressive()
.nullByte
.toRegex();
// ->
/\0/
```### .anyOf
Matches a choice between specified elements. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.anyOf
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.toRegex();
// ->
/(?:XXX|[a-f0-9])/
```### .capture
Creates a capture group for the proceeding elements. Needs to be finalised with `.end()`. Can be later referenced with [backreference(index)](#backreferenceindex).
**Example**
```JavaScript
SuperExpressive()
.capture
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.toRegex();
// ->
/([a-f][0-9]XXX)/
```### .namedCapture(name)
Creates a named capture group for the proceeding elements. Needs to be finalised with `.end()`. Can be later referenced with [namedBackreference(name)](#namedBackreferencename) or [backreference(index)](#backreferenceindex).
**Example**
```JavaScript
SuperExpressive()
.namedCapture('interestingStuff')
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.toRegex();
// ->
/(?[a-f][0-9]XXX)/
```### .namedBackreference(name)
Matches exactly what was previously matched by a [namedCapture](#namedCapturename).
**Example**
```JavaScript
SuperExpressive()
.namedCapture('interestingStuff')
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.string('something else')
.namedBackreference('interestingStuff')
.toRegex();
// ->
/(?[a-f][0-9]XXX)something else\k/
```### .backreference(index)
Matches exactly what was previously matched by a [capture](#capture) or [namedCapture](#namedCapturename) using a positional index. Note regex indexes start at 1, so the first capture group has index 1.
**Example**
```JavaScript
SuperExpressive()
.capture
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.string('something else')
.backreference(1)
.toRegex();
// ->
/([a-f][0-9]XXX)something else\1/
```### .group
Creates a non-capturing group of the proceeding elements. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.optional.group
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.toRegex();
// ->
/(?:[a-f][0-9]XXX)?/
```### .end()
Signifies the end of a SuperExpressive grouping, such as [.anyOf](#anyOf), [.group](#group), or [.capture](#capture).
**Example**
```JavaScript
SuperExpressive()
.capture
.anyOf
.range('a', 'f')
.range('0', '9')
.string('XXX')
.end()
.end()
.toRegex();
// ->
/((?:XXX|[a-f0-9]))/
```### .assertAhead
Assert that the proceeding elements are found without consuming them. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.assertAhead
.range('a', 'f')
.end()
.range('a', 'z')
.toRegex();
// ->
/(?=[a-f])[a-z]/
```### .assertNotAhead
Assert that the proceeding elements are **not** found without consuming them. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.assertNotAhead
.range('a', 'f')
.end()
.range('g', 'z')
.toRegex();
// ->
/(?![a-f])[g-z]/
```### .assertBehind
Assert that the elements contained within **are** found immediately before this point in the string. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.assertBehind
.string('hello ')
.end()
.string('world')
.toRegex();
// ->
/(?<=hello )world/
```### .assertNotBehind
Assert that the elements contained within are **not** found immediately before this point in the string. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.assertNotBehind
.string('hello ')
.end()
.string('world')
.toRegex();
// ->
/(?
/\d?/
```### .zeroOrMore
Assert that the proceeding element may not be matched, or may be matched multiple times.
**Example**
```JavaScript
SuperExpressive()
.zeroOrMore.digit
.toRegex();
// ->
/\d*/
```### .zeroOrMoreLazy
Assert that the proceeding element may not be matched, or may be matched multiple times, but as few times as possible.
**Example**
```JavaScript
SuperExpressive()
.zeroOrMoreLazy.digit
.toRegex();
// ->
/\d*?/
```### .oneOrMore
Assert that the proceeding element may be matched once, or may be matched multiple times.
**Example**
```JavaScript
SuperExpressive()
.oneOrMore.digit
.toRegex();
// ->
/\d+/
```### .oneOrMoreLazy
Assert that the proceeding element may be matched once, or may be matched multiple times, but as few times as possible.
**Example**
```JavaScript
SuperExpressive()
.oneOrMoreLazy.digit
.toRegex();
// ->
/\d+?/
```### .exactly(n)
Assert that the proceeding element will be matched exactly `n` times.
**Example**
```JavaScript
SuperExpressive()
.exactly(5).digit
.toRegex();
// ->
/\d{5}/
```### .atLeast(n)
Assert that the proceeding element will be matched at least `n` times.
**Example**
```JavaScript
SuperExpressive()
.atLeast(5).digit
.toRegex();
// ->
/\d{5,}/
```### .atLeastLazy(n)
Assert that the proceeding element will be matched at least `n` times, but as few times as possible.
**Example**
```JavaScript
SuperExpressive()
.atLeastLazy(5).digit
.toRegex();
// ->
/\d{5,}?/
```### .between(x, y)
Assert that the proceeding element will be matched somewhere between `x` and `y` times.
**Example**
```JavaScript
SuperExpressive()
.between(3, 5).digit
.toRegex();
// ->
/\d{3,5}/
```### .betweenLazy(x, y)
Assert that the proceeding element will be matched somewhere between `x` and `y` times, but as few times as possible.
**Example**
```JavaScript
SuperExpressive()
.betweenLazy(3, 5).digit
.toRegex();
// ->
/\d{3,5}?/
```### .startOfInput
Assert the start of input, or the start of a line when [.lineByLine](#lineByLine) is used.
**Example**
```JavaScript
SuperExpressive()
.startOfInput
.string('hello')
.toRegex();
// ->
/^hello/
```### .endOfInput
Assert the end of input, or the end of a line when [.lineByLine](#lineByLine) is used.
**Example**
```JavaScript
SuperExpressive()
.string('hello')
.endOfInput
.toRegex();
// ->
/hello$/
```### .anyOfChars(chars)
Matches any of the characters in the provided string `chars`.
**Example**
```JavaScript
SuperExpressive()
.anyOfChars('aeiou')
.toRegex();
// ->
/[aeiou]/
```### .anythingBut
Matches any character, except those that match any of the specified elements. Needs to be finalised with `.end()`.
**Example**
```JavaScript
SuperExpressive()
.anythingBut
.digit
.range('a','z')
.string('XXX')
.end()
.toRegex();
// ->
/(?:(?!XXX)[^\da-z])/
```### .anythingButChars(chars)
Matches any character, except any of those in the provided string `chars`.
**Example**
```JavaScript
SuperExpressive()
.anythingButChars('aeiou')
.toRegex();
// ->
/[^aeiou]/
```### .anythingButString(str)
Matches any string the same length as `str`, except the characters sequentially defined in `str`.
**Example**
```JavaScript
SuperExpressive()
.anythingButString('aeiou')
.toRegex();
// ->
/(?:[^a][^e][^i][^o][^u])/
```### .anythingButRange(a, b)
Matches any character, except those that would be captured by the [.range](#rangea-b) specified by `a` and `b`.
**Example**
```JavaScript
SuperExpressive()
.anythingButRange(0, 9)
.toRegex();
// ->
/[^0-9]/
```### .string(s)
Matches the exact string `s`.
**Example**
```JavaScript
SuperExpressive()
.string('hello')
.toRegex();
// ->
/hello/
```### .char(c)
Matches the exact character `c`.
**Example**
```JavaScript
SuperExpressive()
.char('x')
.toRegex();
// ->
/x/
```### .controlChar(c)
Matches a control character using carat notation (`Ctrl^c`) where `c` is a single latin letter from A-Z.
**Example**
```JavaScript
SuperExpressive()
.controlChar('J')
.toRegex();
// ->
/\cJ/
```### .hexCode(hex)
Matches a character with the code `hex`, where `hex` is a 2 dogit hexadecimal string.
**Example**
```JavaScript
SuperExpressive()
.hexCode('2A')
.toRegex();
// ->
/\x2A/
```### .utf16Code(hex)
Matches a UTF-16 code unit with the code `hex`, where `hex` is a 4 digit hexadecimal string.
**Example**
```JavaScript
SuperExpressive()
.utf16Code('002A')
.toRegex();
// ->
/\u002A/
```### .unicodeCharCode(hex)
Matches a Unicode character code with the value `hex`, where `hex` is a 4 or 5 digit hexadecimal string.
Implicitly enables the `u` flag on the regular expression.**Example**
```JavaScript
SuperExpressive()
.unicodeCharCode('0002A')
.toRegex();
// ->
/\u{0002A}/u
```### .unicodeProperty(property)
Matches a Unicode character with the given Unicode property.
See the [MDN Docs](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape) for valid properties.
Implicitly enables the `u` flag on the regular expression.**Example**
```JavaScript
SuperExpressive()
.unicodeProperty('Script=Latin')
.toRegex();
// ->
/\p{Script=Latin}/u
```### .notUnicodeProperty(property)
Matches a Unicode character without the given Unicode property.
See the [MDN Docs](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape) for valid properties.
Implicitly enables the `u` flag on the regular expression.**Example**
```JavaScript
SuperExpressive()
.notUnicodeProperty('Script=Latin')
.toRegex();
// ->
/\P{Script=Latin}/u
```### .range(a, b)
Matches any character that falls between `a` and `b`. Ordering is defined by a characters ASCII or unicode value.
The `u` flag is automatically enabled if either `a` or `b` are unicode characters larger than 2 bytes.**Example**
```JavaScript
SuperExpressive()
.range('a', 'z')
.range('\u{1F600}', '\u{1F606}')
.toRegex();
// ->
/[a-z][😀-😆]/u
```### .subexpression(expr, opts?)
- *opts.namespace*: A **string** namespace to use on all named capture groups in the subexpression, to avoid naming collisions with your own named groups (default = `''`)
- *opts.ignoreFlags*: If set to true, any flags this subexpression specifies should be disregarded (default = `true`)
- *opts.ignoreStartAndEnd*: If set to true, any startOfInput/endOfInput asserted in this subexpression specifies should be disregarded (default = `true`)Matches another SuperExpressive instance inline. Can be used to create libraries, or to modularise you code. By default, flags and start/end of input markers are ignored, but can be explcitly turned on in the options object.
**Example**
```JavaScript
// A reusable SuperExpressive...
const fiveDigits = SuperExpressive().exactly(5).digit;SuperExpressive()
.oneOrMore.range('a', 'z')
.atLeast(3).anyChar
.subexpression(fiveDigits)
.toRegex();
// ->
/[a-z]+.{3,}\d{5}/
```### .toRegexString()
Outputs a string representation of the regular expression that this SuperExpression models.
**Example**
```JavaScript
SuperExpressive()
.allowMultipleMatches
.lineByLine
.startOfInput
.optional.string('0x')
.capture
.exactly(4).anyOf
.range('A', 'F')
.range('a', 'f')
.range('0', '9')
.end()
.end()
.endOfInput
.toRegexString();
// ->
"/^(?:0x)?([A-Fa-f0-9]{4})$/gm"
```### .toRegex()
Outputs the regular expression that this SuperExpression models.
**Example**
```JavaScript
SuperExpressive()
.allowMultipleMatches
.lineByLine
.startOfInput
.optional.string('0x')
.capture
.exactly(4).anyOf
.range('A', 'F')
.range('a', 'f')
.range('0', '9')
.end()
.end()
.endOfInput
.toRegex();
// ->
/^(?:0x)?([A-Fa-f0-9]{4})$/gm
```