{"id":23090543,"url":"https://github.com/rotemdan/regexp-composer","last_synced_at":"2026-04-02T02:37:27.778Z","repository":{"id":265046764,"uuid":"894957697","full_name":"rotemdan/regexp-composer","owner":"rotemdan","description":"Easy-to-use regular expression builder, using a composable, function-oriented style. Supports all regular expression patterns accepted by the JavaScript RegExp engine.","archived":false,"fork":false,"pushed_at":"2025-05-13T08:31:30.000Z","size":58,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-15T02:49:33.827Z","etag":null,"topics":["regular-expression"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rotemdan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-11-27T10:01:29.000Z","updated_at":"2025-05-13T08:31:33.000Z","dependencies_parsed_at":"2024-11-27T11:24:59.209Z","dependency_job_id":"5afadf4b-da06-4f1d-a94a-cfaadd6c5b75","html_url":"https://github.com/rotemdan/regexp-composer","commit_stats":null,"previous_names":["rotemdan/regexp-composer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rotemdan/regexp-composer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotemdan%2Fregexp-composer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotemdan%2Fregexp-composer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotemdan%2Fregexp-composer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotemdan%2Fregexp-composer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rotemdan","download_url":"https://codeload.github.com/rotemdan/regexp-composer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotemdan%2Fregexp-composer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262689565,"owners_count":23349133,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["regular-expression"],"created_at":"2024-12-16T21:00:22.446Z","updated_at":"2026-04-02T02:37:27.732Z","avatar_url":"https://github.com/rotemdan.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Regular expression composer\n\nAn easy-to-use TypeScript / JavaScript regular expression builder library designed to simplify the writing of regular expressions, in a composable, function-oriented style that's significantly more readable and less error-prone than standard regular expression syntax.\n\n* Produces standard JavaScript regular expressions\n* Supports all regular expression patterns accepted by the JavaScript engine\n* Supports all JavaScript runtimes (browsers, Node.js, Deno, Bun)\n* Designed as Unicode aware, from the ground up. Unicode mode enabled and required\n* Patterns are created using functions and can be composed and embedded on multiple regular expressions\n* Automatically escapes special characters\n* Automatically wraps complex patterns with non-capturing groups (`(?:pattern)`)\n* Accepts codepoints as integers, in addition to hexadecimal strings (converts as needed)\n* Unifies disjunctions (like `hello|world`) and character class patterns (like `[Va-zX]`) to a single `anyOf` pattern, where they can be freely mixed\n* Special tokens are expressed as safer constants like `inputStart` (`^`), `inputEnd` (`$`), `anyChar` (`*`) and `lineFeed` (`\\n`)\n* Ensures character and codepoint ranges are valid. Will error on `charRange('z', 'a')` or `codepointRange('a4', 'a1')`\n* Fast and lightweight\n* Full TypeScript type checking\n* No dependencies\n\n## Basic usage\n\nInstall package:\n```sh\nnpm install regexp-composer\n```\n\nBuild and use a simple regular expression\n```ts\nimport { buildRegExp, possibly, inputStart } from 'regexp-composer'\n\n// Build regExp object\nconst regExp = buildRegExp([inputStart, 'Hello world.', possibly(' How are you?')])\n\n// Use it\nregExp.test('Hello world.') // returns true\nregExp.test('Hello world!') // returns false\nregExp.test(' Hello world.') // returns false\nregExp.test('Hello world. How are you?') // returns true\n```\n\nYou can also encode a pattern to a RegExp source string, without compiling it to a RegExp object, using `encodePattern`:\n\n```ts\nimport { encodePattern, possibly, inputStart } from 'regexp-composer'\n\n// Build regexp\nconst regExpSource = encodePattern([inputStart, 'Hello world.', possibly(' How are you?')])\n\nconsole.log(regExpSource) // Prints '^Hello world\\.(?: How are you\\?)?'\n```\n\n## Example patterns\n\nMatch the string `'Hello world.'`:\n\n```ts\n'Hello world.'\n```\n(note characters like `.` within strings are always taken as literals and will be automatically escaped if needed)\n\nEncodes to:\n```\nHello world\\.\n```\n\nMatch the string `'Hello world.'`, optionally followed by `' How are you?'`:\n```ts\n['Hello world.', possibly(' How are you?')]\n```\n\nEncodes to:\n```\nHello world\\.(?: How are you\\?)?\n```\n(note `(?: )` is a non-capturing group inserted to wrap the optional pattern)\n\nMatch a sequence of one or more English characters or digits:\n\n```ts\noneOrMore(anyOf(charRange('a', 'z'), charRange('A', 'Z'), charRange('0', '9')))\n```\n\nEncodes to:\n```\n[a-zA-Z0-9]+\n```\n\nMatch a phone number, like `+23 (555) 432-1234`:\n\n```ts\n// The `digit` pattern is reused several times in `phoneNumberPattern`:\nconst digit = charRange('0', '9')\n\nconst phoneNumberPattern = [\n\tpossibly(['+', captureAs('countryCode', repeated([1, 3], digit)), oneOrMore(' ')]),\n\tpossibly(['(', captureAs('areaCode', repeated(3, digit)), ')', oneOrMore(' ')]),\n\tcaptureAs('localNumber', [\n\t\trepeated(3, digit),\n\t\tpossibly(anyOf('-', ' ')),\n\t\trepeated(4, digit),\n\t])\n]\n```\n\nEncodes to:\n```\n(?:\\+(?\u003ccountryCode\u003e(?:[0-9]){1,3}) +)?(?:\\((?\u003careaCode\u003e(?:[0-9]){3})\\) +)?(?\u003clocalNumber\u003e(?:[0-9]){3}(?:(?:[- ]))?(?:[0-9]){4})\n```\n\n# Pattern reference\n\n## String and character literals\n\nString and character literals are represented as simple strings, like:\n\n```\n'Hello'\n'Cześć'\n'こんにちは'\n'X'\n'嗨'\n```\n\n## Sequence of patterns\n\nA sequence of patterns is written as an array:\n\n```ts\n[pattern1, pattern2, pattern3, ...]\n```\n\n## Optional\n\n### `possibly(pattern)`\n\nAccept if given pattern is matched, or skip if not.\n\nEncodes to `pattern?` or `(?:pattern)?`.\n\n## Choice\n\n###  `anyOf(patterns)`\n\nAccepts the **first pattern** that is matched in the pattern list, or fails if no pattern match.\n\nPatterns can be both single character (like `'x'` or `charRange('a', 'z')` or multi-character, (like `oneOrMore('Hello')`).\n\nEncodes to `(?:pattern1|pattern2|pattern3|...)`.\n\nFor efficiency, consecutive single-character patterns are grouped when encoded. For example:\n\n```ts\nanyOf('V', 'B', 'hello', oneOrMore('bye'), 'good', charRange('a', 'z'), lineFeed, 'world')\n```\nEncodes to:\n```\n(?:[VB]|hello|(?:bye)+|good|[a-z\\n]|world)\n```\n\n### `notAnyOfChars(singleCharPatterns)`\n\nAccepts any character except characters that match the given list of **single character patterns**.\n\nEncodes to `[^singleCharPatterns]`.\n\nFor example:\n```ts\nnotAnyOfChars('V', 'B', charRange('a', 'z'), lineFeed, codepointRange(5234, 5312), unicodeProperty('Punctuation'))\n```\n\nEncodes to `[^VBa-z\\n\\u{1472}-\\u{14c0}\\p{Punctuation}]`.\n\n#### Negating a choice of multi-character patterns\n\n`notAnyOfChars` only works on single character patterns. Negating a set of multi-character patterns, like `NOT('cat', 'dog', 'elephant')`, requires knowing the length, or additional criterions, for a successful positive match (otherwise, how would the RegExp engine know what to match?).\n\nTo achieve this, you can use a form of conditional matching, like `matches(pattern, { except: excludedPattern })`, described in a later section:\n\n```ts\nmatches(oneOrMore(unicodeProperty('Letter')), { except: anyOf('cat', 'dog', 'elephant') })\n```\n\nThis provides enough information for the RegExp engine to know which patterns to accept, and which to exclude.\n\n## Repetition\n\n### `zeroOrMore(pattern)`\n\nAccepts the given pattern, repeated zero or more times.\n\nEncodes to `pattern*` or `(?:pattern)*`.\n\n### `zeroOrMoreNonGreedy(pattern)`\n\nAccepts the given pattern, repeated zero or more times. Non-greedy.\n\nEncodes to `pattern*?` or `(?:pattern)*?`.\n\n### `oneOrMore(pattern)`\n\nAccepts the given pattern, repeated one or more times.\n\nEncodes to `pattern+` or `(?:pattern)+`.\n\n### `oneOrMoreNonGreedy(pattern)`\n\nAccepts the given pattern, repeated one or more times. Non-greedy.\n\nEncodes to `pattern+?` or `(?:pattern)+?`.\n\n### `repeated(count, pattern)`\n\nAccepts the given pattern, only if repeated exactly `count` times.\n\nEncodes to `(?:pattern){count}`.\n\n### `repeated([min, max?], pattern)`\n\nAccepts the given pattern, repeated between `min` and `max` times.\n\nWhen `max` is not given, it default to `Infinity`.\n\nEncodes to `(?:pattern){min,max}`, or `(?:pattern){min,}` when `max` is not given or set to `Infinity`.\n\n### `repeatedNonGreedy([min, max?], pattern)`\n\nAccepts the given pattern, repeated between `min` and `max` times. Non-greedy.\n\nWhen `max` is not given, it default to `Infinity`.\n\nEncodes to `(?:pattern){min,max}?`, or `(?:pattern){min,}?` when `max` is not given or set to `Infinity`.\n\n## Single character patterns\n\n### `codepoint(hexCode)`\n\nAccepts a single character with the given Unicode codepoint, provided as a hexadecimal string.\n\nEncodes to `\\u{hexCode}`.\n\n### `codepoint(integerCode)`\n\nAccepts a single character with the given Unicode codepoint, provided as an integer.\n\n`integerCode` is converted to a Hex-valued string when encoded.\n\nEncodes to `\\u{hexCode}`.\n\n### `charRange(startChar, endChar)`\n\nAccepts a single character within the given character range.\n\nEncodes to `[startChar-endChar]`.\n\n### `codepointRange(startHexCode, endHexCode)`\n\nAccepts a single character within the given Unicode codepoint range.\n\n`startHexCode` and `endHexCode` should be provided as hexadecimal strings.\n\nEncodes to `[\\u{startHexCode}-\\u{endHexCode}]`.\n\n### `codepointRange(startIntegerCode, endIntegerCode)`\n\nAccepts any character within given Unicode codepoint range.\n\n`startIntegerCode` and `endIntegerCode` are converted to a hexadecimal valued strings when encoded.\n\nEncodes to `[\\u{startHexCode}-\\u{endHexCode}]`.\n\n### `unicodeProperty(propertyName)`\n\nAccepts a character matching the given Unicode property name.\n\nEncodes to `\\p{propertyName}`.\n\n### `unicodeProperty(propertyName, value)`\n\nAccepts a character matching the given Unicode property name and value.\n\nEncodes to `\\p{propertyName=value}`.\n\n### `notUnicodeProperty(property)`\n\nAccepts any character that doesn't match the given Unicode property.\n\nEncodes to `\\P{property}]`.\n\n### `notUnicodeProperty(property, value)`\n\nAccepts any character that doesn't match the given Unicode property and value.\n\nEncodes to `\\P{property=value}`.\n\n## Grouping\n\n### `capture(pattern)`\n\nCaptures an unnamed group.\n\nEncodes to `(pattern)`\n\n### `captureAs(name, pattern)`\n\nCaptures a named group.\n\nEncodes to `(?\u003cname\u003epattern)`.\n\n## Backreferences\n\n### `sameAs(groupIndex)`\n\nMatches a pattern to a previous unnamed capturing group.\n\n`groupIndex` is the index of a preceding group. It must be an integer between `1` and `9`.\n\nEncodes to `(?:\\groupIndex)`.\n\n### `sameAs(groupName)`\n\nMatches a pattern to a previous named capturing group.\n\n`groupName` is the name of a preceding named group.\n\nEncodes to `\\k\u003cgroupName\u003e`\n\n### Potential issues with backreference indexes greater than 9\n\n`groupIndex` has been limited to the range of `1..9`, because otherwise, in the case there are more than 9 groups that precede the backreference, the encoded RegExp would produce an ambiguity with a backreference followed by one or more digit literals. For example `\\10` can either be interpreted as either a backreference to the 10th group, or as a backreference to the 1st group, followed by the literal character `0`.\n\nIn the official specification, this ambiguity is resolved by greedily interpreting the sequence `\\10` as a backreference if there are 10 or more preceding groups. However, this context-sensitive logic breaks the ability to efficiently parse the regular expression using a context-free grammar! For that reason I've decided to disallow those cases. For backreference indexes greater than 9, you can use named backreferences instead.\n\n## Conditional matching\n\nThese patterns provide a simplified approach to express various lookahead and lookbehind patterns.\n\n### `matches(pattern, { ifFollowedBy: followingPattern })`\n\nMatches a pattern, with the condition that it is followed by a second pattern.\n\nEncodes to `pattern(?=followingPattern)`.\n\n(positive lookahead positioned after the pattern)\n\n### `matches(pattern, { ifNotFollowedBy: followingPattern })`\n\nMatches a pattern, with the condition that it is not followed by a second pattern.\n\nEncodes to `pattern(?!followingPattern)`.\n\n(negative lookahead positioned after the pattern).\n\n### `matches(pattern, { ifPrecededBy: precedingPattern })`\n\nMatches a pattern, with the condition that it is preceded by a second pattern.\n\nEncodes to `(?\u003c=precedingPattern)pattern`.\n\n(positive lookbehind positioned before the pattern).\n\n### `matches(pattern, { ifNotPrecededBy: precedingPattern })`\n\nMatches a pattern, with the condition that it is not preceded by a second pattern.\n\nEncodes to `(?\u003c!precedingPattern)pattern`.\n\n(negative lookbehind positioned before the pattern).\n\n### `matches(pattern, { ifExtendsTo: extendedPattern })`\n\nMatches a pattern, with the condition that it extends to a second pattern.\n\nEncodes to `(?=followingPattern)pattern`.\n\n(positive lookahead positioned before the pattern).\n\n### `matches(pattern, { except: excludedPattern })`\n\nMatches a pattern, with the condition that it doesn't extend to a second pattern (effectively excluding it).\n\nEncodes to `(?!excludedPattern)pattern`.\n\n(negative lookahead positioned before the pattern).\n\n#### Example:\n\n```ts\nmatches(\n\toneOrMore(unicodeProperty('Letter')), {\n\texcept: anyOf('V', 'hello', charRange('a', 'z'))\n})\n```\nmatches any sequence of letters of length 1 or more, with the exception of the single uppercase letter `V`, the string `hello`, or a single lowercase letter between `a` and `z`.\n\n\n### `matches(pattern, { ifExtendsBackTo: backwardExtendedPattern })`\n\nMatches a pattern, with the condition that it extends backward to a second pattern.\n\nEncodes to `pattern(?\u003c=precedingPattern)`.\n\n(positive lookbehind positioned after the pattern).\n\n### `matches(pattern, { ifNotExtendsBackTo: backwardExtendedPattern })`\n\nMatches a pattern, with the condition that it doesn't extend backward to a second pattern.\n\nEncodes to `pattern(?\u003c!precedingPattern)`.\n\n(negative lookbehind positioned after the pattern).\n\n### Combining multiple conditions\n\nConditions can be combined. For example:\n\n```ts\nmatches(\n\toneOrMore(unicodeProperty('Letter')), {\n\texcept: anyOf('Cat', 'Dog'),\n\tifNotPrecededBy: charRange('0', '9'),\n\tifNotFollowedBy: anyOf('?', '!')\n})\n```\n\nMeans that any sequence of Unicode letters would be matched, given **all** of these conditions are met:\n* It is not 'Cat' or 'Dog'\n* It is not preceded by a digit\n* It is not followed by a question mark or exclamation mark\n\n### Including multiple conditions of the same kind\n\nAlthough not likely to be frequently used, the RegExp engine does allow to define multiple lookahead or lookbehind patterns, producing a form of intersection (conjunction) between the conditions. You can achieve that by passing an array of condition objects as the second argument to `matches`:\n\n```ts\nmatches(\n\toneOrMore(unicodeProperty('Letter')),\n\t[\n\t\t{ ifPrecededBy: unicodeProperty('Letter') },\n\t\t{ ifPrecededBy: unicodeProperty('Script_Extensions', 'Gothic') }\n\t\t{ ifFollowedBy: unicodeProperty('Letter') },\n\t\t{ ifFollowedBy: unicodeProperty('Script_Extensions', 'Greek') },\n\t]\n)\n```\n\n## Special token patterns\n\n* `inputStart`: `^`\n* `inputEnd`: `$`\n* `anyChar`: `.`\n* `whitespace`: `\\s`\n* `nonWhitespace`: `\\S`\n* `digit`: `\\d`\n* `nonDigit`: `\\D`\n* `wordBoundary`: `\\b`\n* `nonWordBoundary`: `\\B`\n* `formFeed`: `\\f`\n* `carriageReturn`: `\\r`\n* `lineFeed`: `\\n`\n* `tab`: `\\t`\n* `verticalTab`: `\\v`\n\n\n#### Word character tokens\n\nThe word character tokens `\\w` and `\\W` are not directly supported because they are not consistently Unicode-aware (they are only Unicode aware when the `ignoreCase` flag is enabled).\n\nTo get consistent results, you can use:\n\n* `anyOf(charRange('a', 'z'), charRange('A', 'Z'), charRange('0', '9'))` for English word characters\n* `anyOf(unicodeProperty('Letter'), unicodeProperty('Mark'), unicodeProperty('Number'))` for Unicode (multilingual) word characters\n\n## Options for `buildRegExp`\n\n**Customizable flags**:\n* `global`: enables the [`g` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/global) when constructing the RegExp\n* `hasIndices`: enables the [`d` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices) when constructing the RegExp\n* `ignoreCase`: enables the [`i` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase) when constructing the RegExp\n* `sticky`: enables the [`y` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky) when constructing the RegExp\n\n**Non-customizable flags**:\n* `multiline`: the [`m` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/multiline), enabling matching of `inputStart` (`^`) tokens to line start, is **always disabled** in the builder, to ensure clear and consistent semantics for `inputStart`\n* `dotAll`: the [`s` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll), causing the `anyChar` (`*`) token to match all tokens, including newlines, is **always enabled** in the builder, to ensure clear and consistent semantics for `anyChar`\n* `unicode`: the [`u` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode), enabling Unicode support, is **always enabled** in the builder, as it is required by the patterns `codepoint`, `codepointRange`, `unicodeProperty` and `notUnicodeProperty`\n* `unicodeSets`: the [`v` flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets), enabling Unicode set support, like `\\p{Script_Extensions=Greek}\u0026\u0026\\p{Letter}`, is currently **always disabled** (it cannot be enabled at the same time when `u` is enabled), but is likely to become used in the future\n\nIf you still want to override the non-customizable flags (risking unexpected errors and confusing behavior) you can encode the pattern to a RegExp source string using `encodePattern`, and compile the resulting string using the `RegExp` constructor, with any set of flags, like `new RegExp(encodePattern(...), flags)`.\n\n## Future\n\n* [Unicode sets](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets), using the `v` flag, would enable things like intersections of Unicode properties, like `unicodeProperties('Letter', ['Script_Extensions', 'Greek'])`\n* [Case sensitivity assertion](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Modifier), could allow to selectively describe patterns that are interpreted in a case-sensitive or case-insensitive way. For example `[caseInsensitive('Hello'), ' world']` would match \"Hello world\", \"HELLO world\", \"hello world\", \"hEllO world\", etc.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frotemdan%2Fregexp-composer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frotemdan%2Fregexp-composer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frotemdan%2Fregexp-composer/lists"}