https://github.com/sisyphsu/retree-java
retree is regular-expression-tree, which supports quickly and concurrently matching of lots of regex patterns.
https://github.com/sisyphsu/retree-java
concurrency-patterns regular-expression-engine regular-expressions regular-expresssion-tool
Last synced: about 2 months ago
JSON representation
retree is regular-expression-tree, which supports quickly and concurrently matching of lots of regex patterns.
- Host: GitHub
- URL: https://github.com/sisyphsu/retree-java
- Owner: sisyphsu
- License: apache-2.0
- Created: 2019-09-06T02:29:30.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-09-17T13:30:14.000Z (over 6 years ago)
- Last Synced: 2025-07-13T03:45:07.770Z (8 months ago)
- Topics: concurrency-patterns, regular-expression-engine, regular-expressions, regular-expresssion-tool
- Language: Java
- Homepage:
- Size: 181 KB
- Stars: 10
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# retree-java
[](https://travis-ci.org/sisyphsu/retree-java)
[](https://codecov.io/gh/sisyphsu/retree-java)
# Introduce
`retree` could parse and combine lots of regular expressions as an `regular-expression-tree`,
which is very similar to `trie`.
First, `retree` would parse `regex` as a **node-chain**, for example:
+ `\d+a\W` will be parsed as `CurlyNode(\d+) -> CharNode(a) -> CharNode(\W)`,
+ `\d+\s\W` will be parsed as `CurlyNode(\d+) -> CharNode(\s) -> CharNode(\W)`.
After that, `retree` will merge those two **node-chain** as one **node-tree**
(there should have an image to explain how it looks)
When performing multiple regex matching, we could use `retree` to reduce useless scan and loop, and avoid lots of backtracking.
# Maven Dependency
Add maven dependency:
```xml
com.github.sisyphsu
retree
1.0.4
```
# Usage
This example shows how `retree` works, it's very similar to `java.util.regex`:
```java
String[] res = {"(\\d{4}-\\d{1,2}-\\d{1,2})", "(?.*)", "(\\w+@\\w+\\.[a-z]+(\\.[a-z]+)?)"};
String input = "Today is 2019-09-05, from sulin (sisyphsu@gmail.com).";
ReMatcher matcher = new ReMatcher(new ReTree(res), input);
assert matcher.find();
assert "2019-09-05".contentEquals(matcher.group());
assert matcher.find();
assert "sulin".contentEquals(matcher.group());
assert "sulin".contentEquals(matcher.group("name"));
assert matcher.find();
assert "sisyphsu@gmail.com".contentEquals(matcher.group());
```
In this example, we only need to scan `input` once to complete three different regular expressions' matching:
+ `(\d{4}-\d{1,2}-\d{1,2})`
+ `(?.*)`
+ `(\\w+@\\w+\\.[a-z]+(\\.[a-z]+)?)`
# Showcase
[**dateparser**](https://github.com/sisyphsu/dateparser)
`dateparser` is a smart and high-performance date parser library,
it supports hundreds of different format, nearly all format that we used.
`dateparser` use `retree` to perform the matching operation of lots of different date patterns.
Even if `dateparser` have thundreds of predefined regular expressions,
it still can parse date very fast(1000~1500ns).
# Performance & Benchmark
TODO
# Multi-Language Support
I will transplant this library to `golang` and `javascript` in nearly future.
# License
Apache-2.0