Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gebv/strparam
parameterized pattern matching (faster alternative to golang regexp)
https://github.com/gebv/strparam
golang golang-library regular-expression string-matching
Last synced: about 2 months ago
JSON representation
parameterized pattern matching (faster alternative to golang regexp)
- Host: GitHub
- URL: https://github.com/gebv/strparam
- Owner: gebv
- License: mit
- Created: 2020-04-30T06:35:58.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-07-06T17:22:43.000Z (over 3 years ago)
- Last Synced: 2024-06-20T17:39:49.416Z (8 months ago)
- Topics: golang, golang-library, regular-expression, string-matching
- Language: Go
- Homepage:
- Size: 97.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# strparam
![CI Status](https://github.com/gebv/strparam/workflows/Go/badge.svg)
[![Go Report Card](https://goreportcard.com/badge/github.com/gebv/strparam)](https://goreportcard.com/report/github.com/gebv/strparam)
[![codecov](https://codecov.io/gh/gebv/strparam/branch/master/graph/badge.svg)](https://codecov.io/gh/gebv/strparam)40 times faster аlternative to regex for string matching by pattern and extract params. This is solution as a middle point between simple strings and regular expressions.
Features
* correctly parses UTF-8 characters
* faster than regular expression
* [multiple pattern match](#multiple-pattern-match)## Introduction
For example. Need to parse the following pattern `foo=(..), baz=(..), golang`. Instead of `..` can be any value.
With regexp, the solution would look something like this.```golang
in := "foo=(bar), baz=(日本語), golang"
re := regexp.MustCompile(`foo=\((.*)\), baz=\((.*)\), golang`)
re.FindAllStringSubmatch(str, -1)
// [[foo=(bar), baz=(日本語), golang bar 日本語]]
```
[On the playground](https://play.golang.org/p/_ENJU_Mjnty)Or even like this.
```golang
in := "foo=(bar), baz=(日本語), golang"
re := regexp.MustCompile(`\(([^)]+)\)`)
rex.FindAllStringSubmatch(str, -1)
// [[(bar) bar] [(日本語) 日本語]]
```
[On the playground](https://play.golang.org/p/SSpy7iiINow)But regular expressions is slow on golang.
Follow the benchmarks for naive solution on regexp (see above) and method `Loockup` for parsed patterns.
```
BenchmarkParamsViaRegexp1
BenchmarkParamsViaRegexp1-4 23230 56140 ns/op 19258 B/op 5 allocs/op
BenchmarkParamsViaRegexp2
BenchmarkParamsViaRegexp2-4 52396 23079 ns/op 28310 B/op 8 allocs/op
BenchmarkParamsViaStrparam_NumParams2
BenchmarkParamsViaStrparam_NumParams2-4 315464 3467 ns/op 295 B/op 1 allocs/op
BenchmarkParamsViaStrparam_NumParams5
BenchmarkParamsViaStrparam_NumParams5-4 193682 5444 ns/op 296 B/op 1 allocs/op
BenchmarkParamsViaStrparam_NumParams20
BenchmarkParamsViaStrparam_NumParams20-4 72276 18467 ns/op 297 B/op 1 allocs/op
```Faster solution.
```golang
in := "foo=(bar), baz=(日本語), golang"
s, _ := Parse("foo=({p1}), baz=({p2}), golang")
found, params := s.Lookup(in)
// true [{Name:p1 Value:bar} {Name:p2 Value:日本語}]
```[On the playground](https://play.golang.org/p/qsj5fNJfPvO)
## Multiple pattern match
Performing multiple pattern match for input string. To use a variety of patterns.
At same level the patterns are sorted (by number of childs and by length constatnt token value) from top to down
Sorting rules:
- CONST type token has the highest weight
- longer CONST type token has a higher weight
- token with more childs has a higher weightTODO: more details on engine a multiple pattern matching
```golang
r := NewStore()
r.Add("foo2{p1}foo2{p2}golang")
r.Add("foo1{p3}foo1{p4}golang")in := "foo1XXXfoo1YYYgolang"
schema := r.Find(in)
found, params := schema.Lookup(in)
```Follow the benchmarks for method `Store.Find` (without extracting parameters).
```
BenchmarkStore_Lookup_2_2
BenchmarkStore_Lookup_2_2-4 255735 4071 ns/op 160 B/op 2 allocs/op
BenchmarkStore_Lookup_2_102
BenchmarkStore_Lookup_2_102-4 108709 12170 ns/op 160 B/op 2 allocs/op
```[On the playground](https://play.golang.org/p/qmHhv_b_1pj)
## Guide
### Installation
```
go get github.com/gebv/strparam
```### Example
Example for a quick start.
```golang
package mainimport (
"fmt""github.com/gebv/strparam"
)func main() {
in := "foo=(bar), baz=(日本語), golang"
s, _ := strparam.Parse("foo=({p1}), baz=({p2}), golang")
ok, params := s.Lookup(in)
fmt.Printf("%v %+v", ok, params)
}```
[On the playground](https://play.golang.org/p/dll0rZYYAlP)
## How does it work?
Pattern is parse into array of
* tokens with offset information in bytes **for constants**.
* tokens with information of parameter (parameter name and other information).This pattern `foo=({p1}), baz=({p2}), golang` looks like an array
```
[
{Mode:begin}
{Mode:pattern Len:5 Raw:"foo=("} // constant
{Mode:parameter Raw:"{p1}"}
{Mode:pattern Len:8 Raw:"), baz=("}
{Mode:parameter Raw:"{p2}"}
{Mode:pattern Len:9 Raw:"), golang"}
{Mode:end}
]
```At the time of parsing the incoming string move around the token array if each token matches. Moving from token to token, we keep the general offset (matching shift). For parameters, look for the next constant (search window) or end of line.
Prefix-tree is used to store the list of patterns.
For example the follow next patterns:
* `foo{p1}bar`
* `foo{p1}baz````
root
└── foo
└── {p1}
├── bar
└── baz
```As parsing incoming string we are moving to deep in the tree.
## TODO
- [x] multiple patterns, lookup and extract params
- [ ] extend parameters for internal validators, eg `{paramName required, len=10}`
- [ ] external validators via hooks
- [ ] stream parser
- [ ] sets weight for equal childs (for sorting), eg `{paramName1 weight=100}`, `{paramName2 weight=200}` (specific case?)# License
[MIT](LICENSE)