https://github.com/anirbanmu/str_metrics
Ruby gem (native extension in Rust) providing implementations of various string metrics
https://github.com/anirbanmu/str_metrics
damerau-levenshtein jaro jaro-winkler levenshtein native native-extension ruby ruby-gem rubygem rust rust-lang sorensen-dice string-comparison string-distance string-metrics strings utility
Last synced: 4 months ago
JSON representation
Ruby gem (native extension in Rust) providing implementations of various string metrics
- Host: GitHub
- URL: https://github.com/anirbanmu/str_metrics
- Owner: anirbanmu
- License: mit
- Created: 2020-01-26T03:20:27.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2022-05-07T08:08:38.000Z (almost 3 years ago)
- Last Synced: 2024-12-08T02:12:52.652Z (5 months ago)
- Topics: damerau-levenshtein, jaro, jaro-winkler, levenshtein, native, native-extension, ruby, ruby-gem, rubygem, rust, rust-lang, sorensen-dice, string-comparison, string-distance, string-metrics, strings, utility
- Language: Ruby
- Size: 94.7 KB
- Stars: 79
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# StrMetrics
[](https://github.com/anirbanmu/str_metrics/actions?query=workflow%3Achecks)
[](https://rubygems.org/gems/str_metrics)
[](LICENSE)Ruby gem (native extension in Rust) providing implementations of various string metrics. Current metrics supported are: Sørensen–Dice, Levenshtein, Damerau–Levenshtein, Jaro & Jaro–Winkler. Strings that are UTF-8 encodable (convertible to UTF-8 representation) are supported. All comparison of strings is done at the grapheme cluster level as described by [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries); this may be different from many gems that calculate string metrics. See [here](#known-compatibility) for known compatibility.
## Getting Started
### PrerequisitesInstall Rust (tested with version `>= 1.47.0`) with:
```sh
curl https://sh.rustup.rs -sSf | sh
```### Known compatibility
#### Ruby
`3.1`, `3.0`, `2.7`, `2.6`, `2.5`, `2.4`, `2.3`, `jruby`, `truffleruby`#### Rust
`1.60.0`, `1.59.0`, `1.58.1`, `1.57.0`, `1.56.1`, `1.55.0`, `1.54.0`, `1.53.0`, `1.52.1`, `1.51.0`, `1.50.0`, `1.49.0`, `1.48.0`, `1.47.0`#### Platforms
`Linux`, `MacOS`, `Windows`### Installation
#### With [`bundler`](https://bundler.io/)
Add this line to your application's Gemfile:
```ruby
gem 'str_metrics'
```And then execute:
$ bundle install
#### Without `bundler`
$ gem install str_metrics
## Usage
All you need to do to use the metrics provided in this gem is to make sure `str_metrics` is required like:
```ruby
require 'str_metrics'
```Each metric is shown below with an example & meanings of optional parameters.
### Sørensen–Dice
```ruby
StrMetrics::SorensenDice.coefficient('abc', 'bcd', ignore_case: false)
=> 0.5
```
Options:Keyword | Type | Default | Description
--- | --- | --- | ---
`ignore_case` | boolean | `false` | Case insensitive comparison?### Levenshtein
```ruby
StrMetrics::Levenshtein.distance('abc', 'acb', ignore_case: false)
=> 2
```
Options:Keyword | Type | Default | Description
--- | --- | --- | ---
`ignore_case` | boolean | `false` | Case insensitive comparison?### Damerau–Levenshtein
```ruby
StrMetrics::DamerauLevenshtein.distance('abc', 'acb', ignore_case: false)
=> 1
```
Options:Keyword | Type | Default | Description
--- | --- | --- | ---
`ignore_case` | boolean | `false` | Case insensitive comparison?### Jaro
```ruby
StrMetrics::Jaro.similarity('abc', 'aac', ignore_case: false)
=> 0.7777777777777777
```
Options:Keyword | Type | Default | Description
--- | --- | --- | ---
`ignore_case` | boolean | `false` | Case insensitive comparison?### Jaro–Winkler
```ruby
StrMetrics::JaroWinkler.similarity('abc', 'aac', ignore_case: false, prefix_scaling_factor: 0.1, prefix_scaling_bonus_threshold: 0.7)
=> 0.7999999999999999StrMetrics::JaroWinkler.distance('abc', 'aac', ignore_case: false, prefix_scaling_factor: 0.1, prefix_scaling_bonus_threshold: 0.7)
=> 0.20000000000000007
```
Options:Keyword | Type | Default | Description
--- | --- | --- | ---
`ignore_case` | boolean | `false` | Case insensitive comparison?
`prefix_scaling_factor` | decimal | `0.1` | Constant scaling factor for how much to weight common prefixes. Should not exceed 0.25.
`prefix_scaling_bonus_threshold` | decimal | `0.7` | Prefix bonus weighting will only be applied if the Jaro similarity is greater given value.## Motivation
The main motivation was to have a central gem which can provide a variety of string metric calculations. Secondary motivation was to experiment with writing a native extension in Rust (instead of C).
## Development
### Getting started
```bash
gem install bundler
git clone https://github.com/anirbanmu/str_metrics.git
cd ./str_metrics
bundle install
```### Building (for native component)
```bash
rake rust_build
```### Testing (will build native component before running tests)
```bash
rake spec
```### Local installation
```bash
rake install
```### Deploying a new version
To deploy a new version of the gem to rubygems:1. Bump version in [version.rb](lib/str_metrics/version.rb) according to [SemVer](https://semver.org/).
2. Get your code merged to `main` branch
3. After a `git pull` on `main` branch:```bash
rake build && rake release
```## Authors
- [Anirban Mukhopadhyay](https://github.com/anirbanmu)See all repo contributors [here](https://github.com/anirbanmu/str_metrics/contributors).
## Versioning
[SemVer](https://semver.org/) is employed. See [tags](https://github.com/anirbanmu/str_metrics/tags) for released versions.
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/anirbanmu/str_metrics.
## Code of Conduct
Everyone interacting in this project's codebase, issue trackers etc. are expected to follow the [code of conduct](CODE_OF_CONDUCT.md).
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details