Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tc39/proposal-regexp-buffer-boundaries
Regular Expression Buffer Boundaries for ECMAScript
https://github.com/tc39/proposal-regexp-buffer-boundaries
Last synced: 3 months ago
JSON representation
Regular Expression Buffer Boundaries for ECMAScript
- Host: GitHub
- URL: https://github.com/tc39/proposal-regexp-buffer-boundaries
- Owner: tc39
- License: bsd-3-clause
- Created: 2021-10-15T20:58:47.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-05-28T00:58:15.000Z (over 2 years ago)
- Last Synced: 2024-08-05T04:04:05.581Z (6 months ago)
- Language: HTML
- Homepage: https://tc39.es/proposal-regexp-buffer-boundaries
- Size: 210 KB
- Stars: 7
- Watchers: 5
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-regex - Buffer boundaries - Anchors `\A` and `\z`, not affected by flag `m`. (JavaScript regex evolution / Regex processors)
README
# Regular Expression Buffer Boundaries for ECMAScript
This proposal seeks to introduce `\A` and `\z` character escapes to Unicode-mode regular expressions as synonyms for `^` and `$` that are not affected by the `m` (multiline) flag.
## Status
**Stage:** 2
**Champion:** Ron Buckton ([@rbuckton](https://github.com/rbuckton))_For detailed status of this proposal see [TODO](#todo), below._
## Authors
* Ron Buckton ([@rbuckton](https://github.com/rbuckton))
# Motivations
> NOTE: See https://github.com/rbuckton/proposal-regexp-features for an overview of
> how this proposal fits into other possible future features for Regular Expressions.Buffer Boundaries are a common feature across a wide array of regular expression engines that
allow you to match the start or end of the entire input regardless of whether the `m` (multiline) flag
has been set. Buffer Boundaries also allow you to match the start/end of a line *and* the start/end of
the input in a single RegExp using the `m` flag.While its possible to emulate `\A` and `\z` using existing patterns, the alternatives are harder to
far read, and require a more comprehensive working understanding of regular experssions to interpret.For example, compare the following approaches:
```js
// emulate `m`-mode `^` outside of `m`-mode:
const a = /^foo|(?<=^|[\u000A\u000D\u2028\u2029])bar/u;// emulate non-`m`-mode `^` inside of `m`-mode using modifiers (proposed):
const b = /(?-m:^)foo|^bar/mu;// using `\A`:
const c = /\Afoo|^bar/mu;
```In the example above, it is far less likely that a reader will readily understand the expression in
example (a). Not only is the content of the regular expression much harder to read, but understanding
its purpose requires interpreting how **six** different features of regular expressions interact:
grouping, positive lookbehind, the `^` metacharacer, disjunctions, character classes, and unicode escapes.Example (b) is a an improvement, but still requires the reader to visually balance the parentheses as
well as to interpret how **four** different regular expression features interact: grouping, modifiers
(proposed), the `m` flag, and the `^` metacharacter.In comparison, example (c) is far easier to read. It consists of a terse escape sequence consisting
of only two characters (`\A`), which makes it far easier to distinguish between special pattern syntax
and plain text segments like `foo` and `bar`.The `\A` and `\z` escapes have broad support across multiple other languages and regular expression
engines. As a result it has the benefit of extensive existing documentation online, including
[Wikipedia](https://en.wikipedia.org/wiki/Regular_expression#Examples), numerous tutorial websites, as
well as the documentation from other languages. This significantly lessens the learning curve for `\A`
over its alternatives.# Prior Art
* [Perl](https://rbuckton.github.io/regexp-features/engines/perl.html#feature-buffer-boundaries)
* [PCRE](https://rbuckton.github.io/regexp-features/engines/pcre.html#feature-buffer-boundaries)
* [Boost.Regex](https://rbuckton.github.io/regexp-features/engines/boost.regex.html#feature-buffer-boundaries)
* [.NET](https://rbuckton.github.io/regexp-features/engines/dotnet.html#feature-buffer-boundaries)
* [Oniguruma](https://rbuckton.github.io/regexp-features/engines/oniguruma.html#feature-buffer-boundaries)
* [Hyperscan](https://rbuckton.github.io/regexp-features/engines/hyperscan.html#feature-buffer-boundaries)
* [ICU](https://rbuckton.github.io/regexp-features/engines/icu.html#feature-buffer-boundaries)
* [Glib/GRegex](https://rbuckton.github.io/regexp-features/engines/glib-gregex.html#feature-buffer-boundaries)See https://rbuckton.github.io/regexp-features/features/buffer-boundaries.html for additional information.
# Syntax
Buffer boundaries are similar to the `^` and `$` anchors, except that they are not affected by the `m` (multiline) flag:
- `\A` — Matches the start of the input.
- `\z` — Matches the end of the input.
- ~~`\Z` — A zero-width assertion consisting of an optional newline at the end of the buffer. Equivalent to `(?=\R?\z)`.~~> NOTE: Requires the `u` or `v` flag, as `\A`, `\z`, and `\Z` are currently just escapes for `A`, `z` and `Z` without the `u` or `v` flag.
> NOTE: Not supported inside of a character class.
> NOTE: The `\Z` assertion is no longer being considered as part of this proposal as of December 15th, 2021, but has
> been reserved for possible future use.For more information about the `v` flag, see https://github.com/tc39/proposal-regexp-set-notation.
~~For more information about the `\R` escape sequence, see https://github.com/tc39/proposal-regexp-r-escape.~~
# Examples
```js
// without buffer boundaries
const pattern = String.raw`^foo$`;
const re1 = new RegExp(pattern, "u");
re1.test("foo"); // true
re1.test("foo\nbar"); // falseconst re2 = new RegExp(pattern, "um");
re1.test("foo"); // true
re1.test("foo\nbar"); // true// with buffer boundaries
const pattern = String.raw`\Afoo\z`;
const re1 = new RegExp(pattern, "u");
re1.test("foo"); // true
re1.test("foo\nbar"); // falseconst re2 = new RegExp(pattern, "um");
re1.test("foo"); // true
re1.test("foo\nbar"); // false// mixing buffer boundaries and anchors
const re = /\Afoo|^bar$|baz\z/um;
re.test("foo"); // true
re.test("foo\n"); // true
re.test("\nfoo"); // falsere.test("bar"); // true
re.test("bar\n"); // true
re.test("\nbar"); // truere.test("baz"); // true
re.test("baz\n"); // false
re.test("\nbaz"); // true
```# History
- October 28, 2021 — Proposed for Stage 1 ([slides](https://1drv.ms/p/s!AjgWTO11Fk-TkfoSnHYFCoo4mYndTA?e=r0YIxu))
- Outcome: Advanced to Stage 1
- December 15, 2021 — Proposed for Stage 2 ([slides](https://1drv.ms/p/s!AjgWTO11Fk-Tkfs-sKyEtV6B_S-poQ?e=U7ToKV))
- Outcome: `\A` and `\z` advanced to Stage 2 (`\Z` did not advance, but will be reserved)
- Stage 2 Reviewers: Richard Gibson, Waldemar Horwat# TODO
The following is a high-level list of tasks to progress through each stage of the [TC39 proposal process](https://tc39.github.io/process-document/):
### Stage 1 Entrance Criteria
* [x] Identified a "[champion][Champion]" who will advance the addition.
* [x] [Prose][Prose] outlining the problem or need and the general shape of a solution.
* [x] Illustrative [examples][Examples] of usage.
* [ ] ~~High-level [API][API].~~### Stage 2 Entrance Criteria
* [x] [Initial specification text][Specification].
* [ ] ~~[Transpiler support][Transpiler] (_Optional_).~~### Stage 3 Entrance Criteria
* [x] [Complete specification text][Specification].
* [ ] Designated reviewers have [signed off][Stage3ReviewerSignOff] on the current spec text.
* [ ] The ECMAScript editor has [signed off][Stage3EditorSignOff] on the current spec text.### Stage 4 Entrance Criteria
* [ ] [Test262](https://github.com/tc39/test262) acceptance tests have been written for mainline usage scenarios and [merged][Test262PullRequest].
* [ ] Two compatible implementations which pass the acceptance tests: [\[1\]][Implementation1], [\[2\]][Implementation2].
* [ ] A [pull request][Ecma262PullRequest] has been sent to tc39/ecma262 with the integrated spec text.
* [ ] The ECMAScript editor has signed off on the [pull request][Ecma262PullRequest].[Process]: https://tc39.es/process-document/
[Proposals]: https://github.com/tc39/proposals/
[Grammarkdown]: http://github.com/rbuckton/grammarkdown#readme
[Champion]: #status
[Prose]: #motivations
[Examples]: #examples
[API]: #api
[Specification]: https://tc39.es/proposal-regexp-buffer-boundaries[Transpiler]: #todo
[Stage3ReviewerSignOff]: #todo
[Stage3EditorSignOff]: #todo
[Test262PullRequest]: #todo
[Implementation1]: #todo
[Implementation2]: #todo
[Ecma262PullRequest]: #todo