Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zadean/xs_regex
XML Schema Regex for Erlang
https://github.com/zadean/xs_regex
erlang regex regular-expression xml xml-schema xpath xquery
Last synced: 4 months ago
JSON representation
XML Schema Regex for Erlang
- Host: GitHub
- URL: https://github.com/zadean/xs_regex
- Owner: zadean
- License: apache-2.0
- Created: 2018-07-12T14:37:11.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2022-01-05T20:52:12.000Z (about 3 years ago)
- Last Synced: 2024-09-26T15:11:02.458Z (4 months ago)
- Topics: erlang, regex, regular-expression, xml, xml-schema, xpath, xquery
- Language: Erlang
- Size: 334 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
xs_regex
=====An OTP library that translates XML Schema regular expressions (described [here](https://www.w3.org/TR/xmlschema-2/#regexs)) into the Erlang flavor.
### XML Schema Features
* Character classes (including the shorthands \i, \I, \c, \C, \d, \D, \w, \W).
* **Character class subtraction** ([a-z-[s-u]] returns [a-rv-z]).
* Alternation and groups.
* Greedy quantifiers
* Unicode properties and language blocks.### XPath / XQuery Features
* Anchors ("^", "$").
* Lazy quantifiers.
* Back-references (also in replacement text using $1 notation).
* Capturing groups.
* i, s, m, x, and q flags#### Flags
* s - "dot-all", "." matches newline
* m - Multi-line mode
* i - Case-insensitive
* x - "extended", whitespaces outside character classes are removed
* q - All characters are treated as literals (s, m, and x have no effect)#### Usage
```erlang
1> xs_regex:translate("[a-z-[s-u]]").
{ok,"[a-rv-z]"}
```The `xs_regex:compile/2` function returns a boolean stating if the pattern matches the empty string, and the compiled pattern.
```erlang
1> xs_regex:compile("[a-z-[s-u]]", "ix").
{false,{re_pattern,0,1,1,
<<69,82,67,80,112,0,0,0,9,8,64,36,1,0,0,0,255,255,255,
255,255,255,...>>}}
```"$1" notation is transformed to "\g{1}" notation.
```erlang
1> Pattern = "(a)|(b)|(c)|(d)|(e)|(f)|(g)|(h)|(i)|(j)".
"(a)|(b)|(c)|(d)|(e)|(f)|(g)|(h)|(i)|(j)"
2> {_, MP} = xs_regex:compile(Pattern, "").
{false,{re_pattern,10,1,1,
<<69,82,67,80,198,0,0,0,32,8,64,36,1,0,0,0,255,255,255,
255,255,255,...>>}}
3> {ok, Depth} = xs_regex:get_depth(Pattern).
{ok,10}
4> {ok, Repl} = xs_regex:transform_replace("$1|$4",Depth).
{ok,"\\g{1}|\\g{4}"}
5> re:replace("abcdefghijk", MP, Repl, [{return,list},global]).
"a||||d||||||k"
```Arabic decimal digits:
```erlang
2> xs_regex:translate("[\\p{IsArabic}-[\\P{Nd}]]").
{ok,"[\\x{660}-\\x{669}\\x{6F0}-\\x{6F9}]"}
```Opening XML tag:
```erlang
3> xs_regex:translate("<\\i\\c*>").
{ok,"<(?-i:[\\x{3A}A-Z\\x{5F}a-z\\x{C0}-\\x{D6}\\x{D8}-\\x{F6}\\x{F8}-\\x{2FF}\\x{370}-\\x{37D}\\x{37F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}])(?-i:[\\x{2D}-\\x{2E}0-\\x{3A}A-Z\\x{5F}a-z\\x{B7}\\x{C0}-\\x{D6}\\x{D8}-\\x{F6}\\x{F8}-\\x{37D}\\x{37F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}])*>"}
```Upper-case Greek:
```erlang
4> xs_regex:translate("^(?:[\\p{IsGreek}-[\\P{Lu}]]+)$").
{ok,"^(?:[\\x{370}\\x{372}\\x{376}\\x{37F}-\\x{383}\\x{386}\\x{388}-\\x{38F}\\x{391}-\\x{3AB}\\x{3CF}\\x{3D2}-\\x{3D4}\\x{3D8}\\x{3DA}\\x{3DC}\\x{3DE}\\x{3E0}\\x{3E2}\\x{3E4}\\x{3E6}\\x{3E8}\\x{3EA}\\x{3EC}\\x{3EE}\\x{3F4}\\x{3F7}\\x{3F9}-\\x{3FA}\\x{3FD}-\\x{3FF}]+)$"}
```Using non-ascii codepoints in a regex:
```erlang
5> {ok,R} = xs_regex:normalize("^what(?:[왶-왹])s\\sthat$").
{ok,[94,119,104,97,116,40,63,58,91,50806,45,50809,93,41,115,
92,115,116,104,97,116,36]}
6> xs_regex:translate(R).
{ok,"^what(?:[\\x{C676}-\\x{C679}])s(?-i:[\\x{9}\\x{A}\\x{D}\\x{20}])that$"}7> {ok,R1} = xs_regex:normalize("^what(?:[ꯀ-ꯏ])s\\sthis$").
{ok,[94,119,104,97,116,40,63,58,91,43968,45,43983,93,41,115,
92,115,116,104,105,115,36]}
8> xs_regex:translate(R1).
{ok,"^what(?:[\\x{ABC0}-\\x{ABCF}])s(?-i:[\\x{9}\\x{A}\\x{D}\\x{20}])this$"}
```Build
-----$ rebar3 compile