{"id":15631099,"url":"https://github.com/ankane/fasttext-ruby","last_synced_at":"2025-11-17T14:15:28.930Z","repository":{"id":46077673,"uuid":"217708469","full_name":"ankane/fastText-ruby","owner":"ankane","description":"Efficient text classification and representation learning for Ruby","archived":false,"fork":false,"pushed_at":"2024-12-31T02:01:06.000Z","size":75,"stargazers_count":216,"open_issues_count":0,"forks_count":9,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-05-08T08:06:07.855Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-10-26T12:53:12.000Z","updated_at":"2025-04-27T15:07:42.000Z","dependencies_parsed_at":"2024-01-13T20:57:12.299Z","dependency_job_id":"29e2d26a-fcd2-4c75-9354-186b9bd1ace6","html_url":"https://github.com/ankane/fastText-ruby","commit_stats":{"total_commits":100,"total_committers":2,"mean_commits":50.0,"dds":"0.010000000000000009","last_synced_commit":"5adaf416ae1f6a2f603c90abdc8346afc46aabb6"},"previous_names":["ankane/fasttext"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2FfastText-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2FfastText-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2FfastText-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2FfastText-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/fastText-ruby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254374484,"owners_count":22060612,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T10:39:09.580Z","updated_at":"2025-10-19T16:33:14.289Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fastText Ruby\n\n[fastText](https://fasttext.cc) - efficient text classification and representation learning - for Ruby\n\n[![Build Status](https://github.com/ankane/fastText-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/fastText-ruby/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"fasttext\"\n```\n\n## Getting Started\n\nfastText has two primary use cases:\n\n- [text classification](#text-classification)\n- [word representations](#word-representations)\n\n## Text Classification\n\nPrep your data\n\n```ruby\n# documents\nx = [\n  \"text from document one\",\n  \"text from document two\",\n  \"text from document three\"\n]\n\n# labels\ny = [\"ham\", \"ham\", \"spam\"]\n```\n\n\u003e Use an array if a document has multiple labels\n\nTrain a model\n\n```ruby\nmodel = FastText::Classifier.new\nmodel.fit(x, y)\n```\n\nGet predictions\n\n```ruby\nmodel.predict(x)\n```\n\nSave the model to a file\n\n```ruby\nmodel.save_model(\"model.bin\")\n```\n\nLoad the model from a file\n\n```ruby\nmodel = FastText.load_model(\"model.bin\")\n```\n\nEvaluate the model\n\n```ruby\nmodel.test(x_test, y_test)\n```\n\nGet words and labels\n\n```ruby\nmodel.words\nmodel.labels\n```\n\n\u003e Use `include_freq: true` to get their frequency\n\nSearch for the best hyperparameters\n\n```ruby\nmodel.fit(x, y, autotune_set: [x_valid, y_valid])\n```\n\nCompress the model - significantly reduces size but sacrifices a little performance\n\n```ruby\nmodel.quantize\nmodel.save_model(\"model.ftz\")\n```\n\n## Word Representations\n\nPrep your data\n\n```ruby\nx = [\n  \"text from document one\",\n  \"text from document two\",\n  \"text from document three\"\n]\n```\n\nTrain a model\n\n```ruby\nmodel = FastText::Vectorizer.new\nmodel.fit(x)\n```\n\nGet nearest neighbors\n\n```ruby\nmodel.nearest_neighbors(\"asparagus\")\n```\n\nGet analogies\n\n```ruby\nmodel.analogies(\"berlin\", \"germany\", \"france\")\n```\n\nGet a word vector\n\n```ruby\nmodel.word_vector(\"carrot\")\n```\n\nGet a sentence vector\n\n```ruby\nmodel.sentence_vector(\"sentence text\")\n```\n\nGet words\n\n```ruby\nmodel.words\n```\n\nSave the model to a file\n\n```ruby\nmodel.save_model(\"model.bin\")\n```\n\nLoad the model from a file\n\n```ruby\nmodel = FastText.load_model(\"model.bin\")\n```\n\nUse continuous bag-of-words\n\n```ruby\nmodel = FastText::Vectorizer.new(model: \"cbow\")\n```\n\n## Parameters\n\nText classification\n\n```ruby\nFastText::Classifier.new(\n  lr: 0.1,                    # learning rate\n  dim: 100,                   # size of word vectors\n  ws: 5,                      # size of the context window\n  epoch: 5,                   # number of epochs\n  min_count: 1,               # minimal number of word occurrences\n  min_count_label: 1,         # minimal number of label occurrences\n  minn: 0,                    # min length of char ngram\n  maxn: 0,                    # max length of char ngram\n  neg: 5,                     # number of negatives sampled\n  word_ngrams: 1,             # max length of word ngram\n  loss: \"softmax\",            # loss function {ns, hs, softmax, ova}\n  bucket: 2000000,            # number of buckets\n  thread: 3,                  # number of threads\n  lr_update_rate: 100,        # change the rate of updates for the learning rate\n  t: 0.0001,                  # sampling threshold\n  label_prefix: \"__label__\",  # label prefix\n  verbose: 2,                 # verbose\n  pretrained_vectors: nil,    # pretrained word vectors (.vec file)\n  autotune_metric: \"f1\",      # autotune optimization metric\n  autotune_predictions: 1,    # autotune predictions\n  autotune_duration: 300,     # autotune search time in seconds\n  autotune_model_size: nil    # autotune model size, like 2M\n)\n```\n\nWord representations\n\n```ruby\nFastText::Vectorizer.new(\n  model: \"skipgram\",          # unsupervised fasttext model {cbow, skipgram}\n  lr: 0.05,                   # learning rate\n  dim: 100,                   # size of word vectors\n  ws: 5,                      # size of the context window\n  epoch: 5,                   # number of epochs\n  min_count: 5,               # minimal number of word occurrences\n  minn: 3,                    # min length of char ngram\n  maxn: 6,                    # max length of char ngram\n  neg: 5,                     # number of negatives sampled\n  word_ngrams: 1,             # max length of word ngram\n  loss: \"ns\",                 # loss function {ns, hs, softmax, ova}\n  bucket: 2000000,            # number of buckets\n  thread: 3,                  # number of threads\n  lr_update_rate: 100,        # change the rate of updates for the learning rate\n  t: 0.0001,                  # sampling threshold\n  verbose: 2                  # verbose\n)\n```\n\n## Input Files\n\nInput can be read directly from files\n\n```ruby\nmodel.fit(\"train.txt\", autotune_set: \"valid.txt\")\nmodel.test(\"test.txt\")\n```\n\nEach line should be a document\n\n```txt\ntext from document one\ntext from document two\ntext from document three\n```\n\nFor text classification, lines should start with a list of labels prefixed with `__label__`\n\n```txt\n__label__ham text from document one\n__label__ham text from document two\n__label__spam text from document three\n```\n\n## Pretrained Models\n\nThere are a number of [pretrained models](https://fasttext.cc/docs/en/supervised-models.html) you can download\n\n### Language Identification\n\nDownload one of the [pretrained models](https://fasttext.cc/docs/en/language-identification.html) and load it\n\n```ruby\nmodel = FastText.load_model(\"lid.176.ftz\")\n```\n\nGet language predictions\n\n```ruby\nmodel.predict(\"bon appétit\")\n```\n\n## History\n\nView the [changelog](https://github.com/ankane/fastText-ruby/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/fastText-ruby/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/fastText-ruby/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone --recursive https://github.com/ankane/fastText-ruby.git\ncd fastText-ruby\nbundle install\nbundle exec rake compile\nbundle exec rake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Ffasttext-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Ffasttext-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Ffasttext-ruby/lists"}