Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/epfl-systemf/regelk
Ocaml Linear Engine for JavaScript Regexes, implementing the algorithms described in Linear Matching of JavaScript Regular Expressions at PLDI24
https://github.com/epfl-systemf/regelk
javascript linear regex
Last synced: 2 months ago
JSON representation
Ocaml Linear Engine for JavaScript Regexes, implementing the algorithms described in Linear Matching of JavaScript Regular Expressions at PLDI24
- Host: GitHub
- URL: https://github.com/epfl-systemf/regelk
- Owner: epfl-systemf
- License: other
- Created: 2024-04-04T08:59:05.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-05-29T15:21:28.000Z (8 months ago)
- Last Synced: 2024-05-30T04:59:16.396Z (8 months ago)
- Topics: javascript, linear, regex
- Language: OCaml
- Homepage:
- Size: 42.8 MB
- Stars: 9
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RegElk - OCaml Linear Engine for JavaScript Regexes
Authors: [Aurèle Barrière](https://aurele-barriere.github.io/) and [Clément Pit-Claudel](https://pit-claudel.fr/clement/).## About
This is a linear regular expression engine for a subset of JavaScript regexes.
The underlying algorithm is an extension of the [PikeVM](https://swtch.com/~rsc/regexp/regexp2.html), supporting more JavaScript features.
This engine implements the algorithms described in the paper [Linear Matching of JavaScript Regular Expressions](https://arxiv.org/abs/2311.17620) by the same authors.In particular, it supports, for the first time with linear time and space complexity:
- nullable JavaScript quantifiers (these have different semantics than in other regex languages, see for instance `(a?b??)*` on string "ab")
- capture reset, a JavaScript-specific property where capture groups are reset at each quantifier iteration (for instance `((a)|(b))*` on string "ab")
- all lookarounds (lookahads and lookbehinds), even with capture groups inside
- linear matching of the greedy or nullable plus.RegElk means **Reg**ex **E**ngine with **L**inear loo**K**arounds.
Elks are [diagonal walkers](https://ecowellness.com/animal-tracking-part-2-common-gait-patterns/), meaning that they reuse their front legs prints for their rear legs to conserve energy, evoking how a PikeVM merges threads reaching the same state to preserve linearity.![RegElk](etc/regelk_logo.jpg)
## Complexity
Given a regex of size `|r|` and a string of size `|s|`, this engine has linear worst-case time complexity in both of them `O(|r|*|s|)`.
While counted quantifiers are supported, they increase the regex size.
For instance, `e{4-8}` will multiply the size of `e` 8 times.
However, the greedy plus (`+` or `{1,}`) or the nonnullable lazy plus (as in `(ab)+?`) are handled without duplication.The engine also has `O(|r|*|s|)` space complexity.
If one wants to avoid a string-size dependent space complexity, we provide alternative register data-structures, presenting various time-space complexity tradeoff.| | Time Complexity | Space Complexity |
|----------------|-----------------------------|------------------|
| List (default) | `O(\|r\|*\|s\|)` | `O(\|r\|*\|s\|)` |
| Array | `O(\|r\|^2*\|s\|)` | `O(\|r\|^2)` |
| Tree | `O(\|r\|*log(\|r\|)*\|s\|)` | `O(\|r\|^2)` |Note however that a `O(|r|*|s|)` space complexity cannot be avoided when using our linear lookaround algorithm.
## Supported Features
| Feature | Example |
|-------------------------------|-------------------------------------------|
| Lookaheads | `a(?=(b))`, `a(?!=b)` |
| Lookbehinds | `(?<=b)a`, `(?