https://github.com/kmarius/jsregexp
JavaScript regular expressions for Lua
https://github.com/kmarius/jsregexp
javascript lua regex regexp
Last synced: 10 months ago
JSON representation
JavaScript regular expressions for Lua
- Host: GitHub
- URL: https://github.com/kmarius/jsregexp
- Owner: kmarius
- License: mit
- Created: 2021-08-22T22:27:32.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-10-27T18:05:40.000Z (over 1 year ago)
- Last Synced: 2024-11-05T13:42:25.769Z (over 1 year ago)
- Topics: javascript, lua, regex, regexp
- Language: C
- Homepage:
- Size: 276 KB
- Stars: 30
- Watchers: 2
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# jsregexp
Provides ECMAScript regular expressions for Lua 5.1, 5.2, 5.3, 5.4 and LuaJit. Uses `libregexp` from Fabrice Bellard's [QuickJS](https://bellard.org/quickjs/).
## Installation
To install `jsregexp` globally with [luarocks](https://luarocks.org/modules/kmarius/jsregexp),
run
```bash
sudo luarocks install jsregexp
```
To install `jsregexp` for a different lua version (in this case Lua5.1 or LuaJit), run
```bash
sudo luarocks --lua-version 5.1 install jsregexp
```
To install `jsregexp` locally for your user, run
```bash
luarocks --local --lua-version 5.1 install jsregexp
```
This will place the compiled module in `$HOME/.luarocks/lib/lua/5.1` so `$HOME/.luarocks/lib/lua/5.1/?.so` needs to be added to `package.cpath`.
Simply running `make` in this project's root will compile the module `jsregexp.so` (tested on linux only).
## Usage
This module provides two functions
```lua
jsregexp.compile(regex, flags?)
jsregexp.compile_safe(regex, flags?)
```
that take an ECMAScript regular expression as a string and an optional string of flags, most notably
- `"d"` provide tables with begin/end indices of match groups in match objects
- `"i"`: case insensitive search
- `"g"`: match globally
- `"n"`: enables named groups (not present in JavaScript, needs to be enabled manually if needed)
- `"u"`: utf-16 support if detected in the pattern string (**implicity set**)
The complete list of flags can be found in the [JavaScript reference](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp#parameters).
On success, `compile` and `compile_safe` return a RegExp object. On failure, `compile` throws an error while `compile_save` returns `nil` and an error message.
### RegExp object
Each RegExp object `re` has the following fields
```lua
re.last_index -- the position at wchich the next match will be searched in re:exec or re:test (see notes below)
re.source -- the regexp string
re.flags -- a string representing the active flags
re.dot_all -- is the dod_all flag set?
re.global -- is the global flag set?
re.has_indices -- is the indices flag set?
re.ignore_case -- is the ignore_case flag set?
re.multiline -- is the multiline flag set?
re.sticky -- is the sticky flag set?
re.unicode -- is the unicode flag set?
```
Calling `tostring` on a RegExp object returns representation in the form of `"//"`.
The RegExp object `re` has the following methods corresponding to JavaScript regular expressions:
```lua
re:exec(str) -- returns the next match of re in str (see notes below)
re:test(str) -- returns true if the regex matches str (see notes below)
re:match(str) -- returns a list of all matches or nil if no match
re:match_all(str) -- returns a closure that repeatedly calls re:exec, to be used in for-loops
re:match_all_list(str) -- returns a list of all matches
re:search(str) -- returns the 1-based index of the first match of re in str, or -1 if no match
re:split(str, limit?) -- splits str at re, at most limit times
re:replace(str, replacement) -- relplace the first match of re in str by replacement (all, if global)
re:replace_all(str, replacement) -- relplace each match of re in str by replacement
```
For the documentation of the behaviour of each of these functions, see the [JavaScript reference](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp).
**Note:** Each regexp object has a field `last_index` which denotes the position at which the next call to `exec` and `test` searches for the next match.
Afterwards `last_index` is changed accordingly. If you need to use these methods, you should reset `last_index` to 1.
**Note:** Because the regexp engine used works with UTF16 instead of UTF8, the input string is converted to UTF16 if necessary. Calling `exec` or `test` on
non-Ascii strings repeatedly could potentially introduce a large overhead. This conversion only needs to be done once for the `match*` methods, you probably want to use those instead.
### Match object
A match object `m` returned by `exec` and the `match*` functions has the following fields:
```lua
m[0] -- the full match
m[i] -- match group i
m.input -- the input string
m.capture_count -- number of capture groups
m.index -- start of the capture (1-based)
m.groups -- table of the named groups and their content
m.indices -- table of begin/end indices of all match groups (if "d" flag is set)
m.indices.groups -- table of named groups and their begin/end indices (if "d" flag is set)
```
Calling `tostring` on a match object returns the full match `m[0]`.
## Example
```lua
local jsregexp = require("jsregexp")
local re, err = jsregexp.compile_safe("(\\w)\\w*", "g")
if not re then
print(err)
return
end
local str = "Hello World"
for match in re:match_all(str) do
print(match)
for j, group in ipairs(match) do
print(j, group)
end
end
```