https://github.com/yoshoku/sentencepiece.rb
sentencepiece.rb provides Ruby bindings for SentencePiece
https://github.com/yoshoku/sentencepiece.rb
Last synced: 6 months ago
JSON representation
sentencepiece.rb provides Ruby bindings for SentencePiece
- Host: GitHub
- URL: https://github.com/yoshoku/sentencepiece.rb
- Owner: yoshoku
- License: apache-2.0
- Created: 2023-03-20T23:05:51.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-01T12:16:13.000Z (9 months ago)
- Last Synced: 2025-04-19T07:50:51.869Z (6 months ago)
- Language: C++
- Homepage: https://rubygems.org/gems/sentencepiece
- Size: 293 KB
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# sentencepiece.rb
[](https://github.com/yoshoku/sentencepiece.rb/actions/workflows/main.yml)
[](https://badge.fury.io/rb/sentencepiece)
[](https://github.com/yoshoku/sentencepiece.rb/blob/main/LICENSE.txt)
[](https://yoshoku.github.io/sentencepiece.rb/doc/)sentencepiece.rb provides Ruby bindings for the [SentencePiece](https://github.com/google/sentencepiece),
an unsupervised text tokenizer and detokenizer for neural network-based text generation.## Installation
Install SentencePiece using your OS package manager;
macOS:
$ brew install sentencepiece
Ubuntu:
$ sudo apt-get install sentencepiece libsentencepiece-dev
Install the gem and add to the application's Gemfile by executing:
$ bundle add setencepiece
If bundler is not being used to manage dependencies, install the gem by executing:
$ gem install sentencepiece
If you use homebrew on Apple M1/M2 mac, specify the homebrew installation directory:
$ gem install sentencepiece -- --with-opt-dir=/opt/homebrew
## Usage
```ruby
require 'sentencepiece'sp = SentencePiece::SentencePieceProcessor.new(model_file: '/path/to/model_file.model')
sp.encode('This is a test')
# => [17522, 2852, 29, 2002]sp.encode(['This is a test', 'Hello world'])
# => [[17522, 2852, 29, 2002], [9770, 18905]]sp.encode('This is a test', out_type: 'str')
# => ["▁This", "▁is", "▁a", "▁test"]sp.decode([17522, 2852, 29, 2002])
# => "This is a test"sp.decode([[17522, 2852, 29, 2002], [9770, 18905]])
# => ["This is a test", "Hello world"]
```## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/sentencepiece.rb/.
This project is intended to be a safe, welcoming space for collaboration,
and contributors are expected to adhere to the [code of conduct](https://github.com/yoshoku/sentencepiece.rb/blob/main/CODE_OF_CONDUCT.md).## Code of Conduct
Everyone interacting in the SentencePiece project's codebases, issue trackers,
chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/yoshoku/sentencepiece.rb/blob/main/CODE_OF_CONDUCT.md).