{"id":13878402,"url":"https://github.com/ankane/blingfire-ruby","last_synced_at":"2025-11-17T14:08:26.357Z","repository":{"id":59151148,"uuid":"242851424","full_name":"ankane/blingfire-ruby","owner":"ankane","description":"High speed text tokenization for Ruby","archived":false,"fork":false,"pushed_at":"2025-05-05T03:35:04.000Z","size":64,"stargazers_count":70,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-11-10T18:33:51.816Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-24T21:50:25.000Z","updated_at":"2025-09-04T08:35:40.000Z","dependencies_parsed_at":"2023-01-31T01:00:48.454Z","dependency_job_id":"c58ad629-f6d9-46a4-9bbb-42510b8ceb6a","html_url":"https://github.com/ankane/blingfire-ruby","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/ankane/blingfire-ruby","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fblingfire-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fblingfire-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fblingfire-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fblingfire-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/blingfire-ruby/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fblingfire-ruby/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284654282,"owners_count":27041748,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-16T02:00:05.974Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-06T08:01:48.578Z","updated_at":"2025-11-17T14:08:26.341Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"# Bling Fire Ruby\n\n[Bling Fire](https://github.com/microsoft/BlingFire) - high speed text tokenization - for Ruby\n\n[![Build Status](https://github.com/ankane/blingfire-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/blingfire-ruby/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"blingfire\"\n```\n\n## Getting Started\n\nCreate a model\n\n```ruby\nmodel = BlingFire::Model.new\n```\n\nTokenize words\n\n```ruby\nmodel.text_to_words(text)\n```\n\nTokenize sentences\n\n```ruby\nmodel.text_to_sentences(text)\n```\n\nGet offsets for words\n\n```ruby\nwords, start_offsets, end_offsets = model.text_to_words_with_offsets(text)\n```\n\nGet offsets for sentences\n\n```ruby\nsentences, start_offsets, end_offsets = model.text_to_sentences_with_offsets(text)\n```\n\n## Pre-trained Models\n\nBling Fire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:\n\n- [BERT Base](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_base_tok.bin), [BERT Base Cased](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_base_cased_tok.bin), [BERT Chinese](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_chinese.bin), [BERT Multilingual Cased](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_multi_cased.bin)\n- [GPT-2](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/gpt2.bin)\n- [Laser 100k](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/laser100k.bin), [Laser 250k](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/laser250k.bin), [Laser 500k](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/laser500k.bin)\n- [RoBERTa](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/roberta.bin)\n- [Syllab](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/syllab.bin)\n- [URI 100k](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/uri100k.bin), [URI 250k](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/uri250k.bin), [URI 500k](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/uri500k.bin)\n- [XLM-RoBERTa](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/xlm_roberta_base.bin)\n- [XLNet](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/xlnet.bin), [XLNet No Norm](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/xlnet_nonorm.bin)\n- [WBD](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/wbd_chuni.bin)\n\nLoad a model\n\n```ruby\nmodel = BlingFire.load_model(\"bert_base_tok.bin\")\n```\n\nConvert text to ids\n\n```ruby\nmodel.text_to_ids(text)\n```\n\nGet offsets for ids\n\n```ruby\nids, start_offsets, end_offsets = model.text_to_ids_with_offsets(text)\n```\n\nDisable prefix space\n\n```ruby\nmodel = BlingFire.load_model(\"roberta.bin\", prefix: false)\n```\n\n## Ids to Text\n\nLoad a model\n\n```ruby\nmodel = BlingFire.load_model(\"bert_base_tok.i2w\")\n```\n\nConvert ids to text\n\n```ruby\nmodel.ids_to_text(ids)\n```\n\n## History\n\nView the [changelog](https://github.com/ankane/blingfire-ruby/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/blingfire-ruby/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/blingfire-ruby/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/blingfire-ruby.git\ncd blingfire-ruby\nbundle install\nbundle exec rake vendor:all download:models\nbundle exec rake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fblingfire-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Fblingfire-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fblingfire-ruby/lists"}