Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alabamenhu/regexfuzzytoken
A Raku (née Perl 6) module enabling a fuzzy token in grammars and regexen
https://github.com/alabamenhu/regexfuzzytoken
Last synced: about 1 month ago
JSON representation
A Raku (née Perl 6) module enabling a fuzzy token in grammars and regexen
- Host: GitHub
- URL: https://github.com/alabamenhu/regexfuzzytoken
- Owner: alabamenhu
- License: artistic-2.0
- Created: 2019-12-13T22:18:55.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-02-22T04:39:49.000Z (over 2 years ago)
- Last Synced: 2023-03-08T10:08:28.921Z (over 1 year ago)
- Language: Raku
- Homepage:
- Size: 795 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![Regex::FuzzyToken for Raku](resources/logo.png)
A Raku module enabling the use of fuzzy tokens in regexen and grammars:
use Regex::FuzzyToken;
my @fruits = ;
"My favorite fruit is a bnana" ~~ /My favorite fruit is a /;
say $.fuzz; # bnana
say $.Str; # bananaSupport for more flexible capturing options is forthcoming. The signature for
the fuzzy token are the following:fuzzy(*@words,
:$i = False,
:$m = False,
:$ws = True,
:$q = 33,
:$capture = (@words.tail ~~ Regex
?? @words.tail
!! /\w+/ )
)Basically, you should provide a list of strings to be the “goal” words you want to find,
but will accept if they're slightly misspelled. The `i`, `m`, and `ws` options mimic
the regex matching behavior, and will make comparisons ignore differences in case (**i**),
marks (**m**) or white space (**ws**). The `q` option is the minimum Q-gram score desired to
allow for a match. The default of 33 is fine for most cases, but through testing you
may find it necessary to increase or decrease the sensitivity (100 = only match exact,
0 = match everything).The final option of :$capture allows you to specify the capture regex to use. By
default it will only capture a sequence of word characters, but that will cause
problems if you need it to match spaces/apostrophes. While you *can* make things
explicit with `:capture(/foo/)`, the signature was designed to allow you to
specify the final item as the capture regex, and so the following are equivalent:
# To do
It could be interesting to allow for a more complex capture, for example, that
matches only as many characters as it needs using `.match($capture, :exhaustive)` on a
substring from the current `.pos`. That would require some tuning of the Q gram algorithm
and in many cases could take exponentially more time, but would be more accurate / usable.# Licenses
All code is and documentation is licensed under the Artistic License 2.0, included with
the module. The image used on Github is based on
[this butterfly](https://www.piqsels.com/en/public-domain-photo-frbsj)
which is licensed under CC-0 and modified in accordance with that license and released
under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/)