Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/juanitofatas/active_normalizer
Easily switch normalizer you want to use with Active Normalizer
https://github.com/juanitofatas/active_normalizer
normalize normalizer tool unicode utility
Last synced: 29 days ago
JSON representation
Easily switch normalizer you want to use with Active Normalizer
- Host: GitHub
- URL: https://github.com/juanitofatas/active_normalizer
- Owner: JuanitoFatas
- License: mit
- Created: 2018-06-15T07:22:56.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-06-17T05:52:10.000Z (over 6 years ago)
- Last Synced: 2024-09-17T04:19:05.532Z (about 2 months ago)
- Topics: normalize, normalizer, tool, unicode, utility
- Language: Ruby
- Homepage: https://github.com/JuanitoFatas/active_normalizer
- Size: 11.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Active Normalizer
Normalize weird Japanese characters, see [tests](/spec) for examples.
Normalize fullwidth, halfwidth hiragana, katakana, symbols.
## Usage
Each normalizer class accepts option of `:nfc`, `:nfd`, `:nfkd`, `:nfkc` (See [Normalization Forms][unicode-nf] for more information).
Each normalizer instance responds to `run`.```ruby
require "active_normalizer/normalizers/ruby"
nfkc_normalizer = ActiveNormalizer.new(
ActiveNormalizer::Normalizers::Ruby,
options: :nfkc
)
nfkc_normalizer.run(input)
```## Benchmark
```
Benchmarking simple string: 800ー12345
Warming up --------------------------------------
UNF 92.981k i/100ms
Unicode 36.002k i/100ms
Ruby 17.044k i/100ms
UnicodeUtils 12.681k i/100ms
ActiveSupport 7.482k i/100ms
Calculating -------------------------------------
UNF 1.173M (±17.6%) i/s - 5.672M in 5.041037s
Unicode 404.502k (± 6.8%) i/s - 2.016M in 5.008748s
Ruby 191.562k (±30.3%) i/s - 835.156k in 5.106057s
UnicodeUtils 132.477k (± 5.3%) i/s - 672.093k in 5.088759s
ActiveSupport 75.011k (±34.9%) i/s - 329.208k in 5.058559sComparison:
UNF: 1172663.8 i/s
Unicode: 404502.1 i/s - 2.90x slower
Ruby: 191562.4 i/s - 6.12x slower
UnicodeUtils: 132477.3 i/s - 8.85x slower
ActiveSupport: 75010.6 i/s - 15.63x slowerWarming up --------------------------------------
UNF 67.181k i/100ms
Unicode 31.572k i/100ms
Ruby 14.947k i/100ms
UnicodeUtils 12.443k i/100ms
ActiveSupport 5.561k i/100ms
Calculating -------------------------------------
UNF 997.098k (±25.2%) i/s - 27.477M in 30.052018s
Unicode 328.071k (±19.5%) i/s - 9.503M in 30.090451s
Ruby 177.045k (±32.8%) i/s - 4.529M in 30.071040s
UnicodeUtils 134.513k (± 6.7%) i/s - 4.019M in 30.059621s
ActiveSupport 68.063k (±44.7%) i/s - 1.668M in 30.131968sComparison:
UNF: 997097.6 i/s
Unicode: 328070.8 i/s - 3.04x slower
Ruby: 177044.6 i/s - 5.63x slower
UnicodeUtils: 134512.7 i/s - 7.41x slower
ActiveSupport: 68063.1 i/s - 14.65x slowerBenchmarking longer string: ㍻㍼㍽㍾㌀㌁㌂㌃㌄㌅㌆㌇㌈㌉㌊㌋㌌㌍㌎㌏㌐㌑㌒㌓㌔㌕㌖㌗㌘㌙㌚㌛㌜㌝㌞㌟㌠㌡㌢㌣㌤㌥㌦㌧㌨㌩㌪㌫㌬㌭㌮㌯㌰㌱㌲㌳㌴㌵㌶㌷㌸㌹㌺㌻㌼㌽㌾㌿㍀㍁㍂㍃㍄㍅㍆㍇㍈㍉㍊㍋㍌㍍㍎㍏㍐㍑㍒㍓㍔㍕㍖㍗
Warming up --------------------------------------
UNF 6.023k i/100ms
Unicode 1.238k i/100ms
Ruby 1.068k i/100ms
UnicodeUtils 319.000 i/100ms
ActiveSupport 258.000 i/100ms
Calculating -------------------------------------
UNF 59.891k (± 6.8%) i/s - 301.150k in 5.055411s
Unicode 11.740k (± 9.0%) i/s - 59.424k in 5.103353s
Ruby 10.655k (±10.9%) i/s - 53.400k in 5.091860s
UnicodeUtils 3.087k (± 8.9%) i/s - 15.312k in 5.004688s
ActiveSupport 2.533k (±11.1%) i/s - 12.642k in 5.064477sComparison:
UNF: 59890.8 i/s
Unicode: 11740.2 i/s - 5.10x slower
Ruby: 10655.0 i/s - 5.62x slower
UnicodeUtils: 3087.4 i/s - 19.40x slower
ActiveSupport: 2532.6 i/s - 23.65x slowerWarming up --------------------------------------
UNF 5.739k i/100ms
Unicode 1.122k i/100ms
Ruby 1.113k i/100ms
UnicodeUtils 312.000 i/100ms
ActiveSupport 254.000 i/100ms
Calculating -------------------------------------
UNF 59.371k (± 4.4%) i/s - 1.779M in 30.026571s
Unicode 10.780k (±17.3%) i/s - 310.794k in 30.106556s
Ruby 11.144k (± 6.7%) i/s - 332.787k in 30.034689s
UnicodeUtils 3.164k (± 4.9%) i/s - 94.848k in 30.056928s
ActiveSupport 2.635k (± 8.8%) i/s - 78.486k in 30.075836sComparison:
UNF: 59371.2 i/s
Ruby: 11143.9 i/s - 5.33x slower
Unicode: 10779.6 i/s - 5.51x slower
UnicodeUtils: 3163.5 i/s - 18.77x slower
ActiveSupport: 2635.3 i/s - 22.53x slower
```Benchmark code can be found at [bin/benchmark](bin/benchmark).
## Installation
Add this line to your application's Gemfile:
```ruby
gem "active_normalizer"
```And then execute:
$ bundle
Or install it yourself as:
$ gem install active_normalizer
## Dependnecies
Active Normalizer provides a handful of normalizers. Their dependencies are not bundled except for one that utilizes standard library. You must bundle the normalizer's gem dependency.
#### ActiveNormalizer::Normalizers::Ruby
```ruby
# no dependency required, standard libraryrequire "active_normalizer/normalizers/ruby"
```#### ActiveNormalizer::Normalizers::UNF - unf
```ruby
gem "unf"require "active_normalizer/normalizers/unf"
```#### ActiveNormalizer::Normalizers::Unicode - unicode
```ruby
gem "unicode"require "active_normalizer/normalizers/unicode"
```#### ActiveNormalizer::Normalizers::UnicodeUtils - unicode_utils
```ruby
gem "unicode_utils"require "active_normalizer/normalizers/unicode_utils"
```#### ActiveNormalizer::Normalizers::ActiveSupportMultibyte - active_support
```ruby
gem "active_support"require "active_normalizer/normalizers/active_support"
```## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/hack` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/JuanitoFatas/active_normalizer.
## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
[unicode-nf]: http://unicode.org/reports/tr15/#Norm_Forms