https://github.com/kojix2/blingfire-crystal
https://github.com/kojix2/blingfire-crystal
crystal tokenizers
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/kojix2/blingfire-crystal
- Owner: kojix2
- License: mit
- Created: 2023-04-02T01:48:46.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2026-01-27T03:28:38.000Z (5 months ago)
- Last Synced: 2026-01-27T15:53:28.608Z (5 months ago)
- Topics: crystal, tokenizers
- Language: Crystal
- Homepage: https://kojix2.github.io/blingfire-crystal/
- Size: 60.5 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BlingFire for Crystal
[](https://github.com/kojix2/blingfire-crystal/actions/workflows/build.yml)
This is a Crystal port of the [blingfire-ruby](https://github.com/ankane/blingfire-ruby). This port aims to bring the power of [BlingFire](https://github.com/microsoft/BlingFire) tokenizers to Crystalists. This library allows you to run GPT-2 tokenization compatible with [ChatGPT](https://chat.openai.com/).
## Installation
```sh
git clone https://github.com/kojix2/blingfire-crystal
crystal run downloader.cr
crystal spec
```
downloader.cr downloads compiled libraries from [ankane/ml-builds](https://github.com/ankane/ml-builds). It also downloads [some models](https://github.com/microsoft/BlingFire/tree/master/dist-pypi/blingfire) from the official BlingFire repository.
## example
See gpt2.cr in example directory
```crystal
require "../src/blingfire"
# Load the model
model = BlingFire::Model.new("gpt2.bin")
# Get the text
text = "Intelligence is an accident of evolution, and not necessarily an advantage."
# Tokenize the text
tokens = model.text_to_ids(text)
# Print the tokens
puts tokens
# Token to text
model = BlingFire::Model.new("gpt2.i2w")
# Print the text
text = model.ids_to_text(tokens)
# Print the text
puts text
```
## Documentation
- [blingfire-crystal](https://kojix2.github.io/blingfire-crystal/)
## Development
This port is a hurried work based on ankane/blingfire-ruby. It has passed basic tests, but there might still exist some undiscovered bugs. Please use it with care and report any issues you find. Pull requests and forks are much appreciated.
## License
This project is licensed under the MIT License. Please see the [LICENSE](LICENSE) file for more information.