{"id":13513542,"url":"https://github.com/ankane/tomoto-ruby","last_synced_at":"2025-11-17T14:11:45.105Z","repository":{"id":60693971,"uuid":"302761833","full_name":"ankane/tomoto-ruby","owner":"ankane","description":"High performance topic modeling for Ruby","archived":false,"fork":false,"pushed_at":"2025-10-27T00:13:17.000Z","size":150,"stargazers_count":65,"open_issues_count":2,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-11-02T21:05:09.394Z","etag":null,"topics":["latent-dirichlet-allocation","lda","topic-modeling"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-09T22:00:01.000Z","updated_at":"2025-10-27T00:11:43.000Z","dependencies_parsed_at":"2024-07-22T17:51:15.580Z","dependency_job_id":"de54f0eb-d597-4f64-89bd-1bc37f13827d","html_url":"https://github.com/ankane/tomoto-ruby","commit_stats":{"total_commits":166,"total_committers":3,"mean_commits":"55.333333333333336","dds":0.4879518072289156,"last_synced_commit":"44e5dee410758d21afa76790bd38d7ed9f5abdd1"},"previous_names":["ankane/tomoto"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/ankane/tomoto-ruby","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Ftomoto-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Ftomoto-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Ftomoto-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Ftomoto-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/tomoto-ruby/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Ftomoto-ruby/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284894125,"owners_count":27080648,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-17T02:00:06.431Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["latent-dirichlet-allocation","lda","topic-modeling"],"created_at":"2024-08-01T05:00:30.651Z","updated_at":"2025-11-17T14:11:45.089Z","avatar_url":"https://github.com/ankane.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# tomoto.rb\n\n:tomato: [tomoto](https://github.com/bab2min/tomotopy) - high performance topic modeling - for Ruby\n\n[![Build Status](https://github.com/ankane/tomoto-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/tomoto-ruby/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"tomoto\"\n```\n\n## Getting Started\n\nTrain a model\n\n```ruby\nmodel = Tomoto::LDA.new(k: 2)\nmodel.add_doc([\"tokens\", \"from\", \"document\", \"one\"])\nmodel.add_doc([\"tokens\", \"from\", \"document\", \"two\"])\nmodel.add_doc([\"tokens\", \"from\", \"document\", \"three\"])\nmodel.train(100) # iterations\n```\n\nGet the summary\n\n```ruby\nmodel.summary\n```\n\nGet topic words\n\n```ruby\nmodel.topic_words\n```\n\nSave the model to a file\n\n```ruby\nmodel.save(\"model.bin\")\n```\n\nLoad the model from a file\n\n```ruby\nmodel = Tomoto::LDA.load(\"model.bin\")\n```\n\nGet topic probabilities for a document\n\n```ruby\ndoc = model.docs[0]\ndoc.topics\n```\n\nGet the number of words for each topic\n\n```ruby\nmodel.count_by_topics\n```\n\nGet the vocab\n\n```ruby\nmodel.vocabs\n```\n\nGet the log likelihood per word\n\n```ruby\nmodel.ll_per_word\n```\n\nPerform inference for unseen documents\n\n```ruby\ndoc = model.make_doc([\"unseen\", \"doc\"])\ntopic_dist, ll = model.infer(doc)\n```\n\n## Models\n\nSupports:\n\n- Latent Dirichlet Allocation (`LDA`)\n- Labeled LDA (`LLDA`)\n- Partially Labeled LDA (`PLDA`)\n- Supervised LDA (`SLDA`)\n- Dirichlet Multinomial Regression (`DMR`)\n- Generalized Dirichlet Multinomial Regression (`GDMR`)\n- Hierarchical Dirichlet Process (`HDP`)\n- Hierarchical LDA (`HLDA`)\n- Multi Grain LDA (`MGLDA`)\n- Pachinko Allocation (`PA`)\n- Hierarchical PA (`HPA`)\n- Correlated Topic Model (`CT`)\n- Dynamic Topic Model (`DT`)\n\n## API\n\nThis library follows the [tomotopy API](https://bab2min.github.io/tomotopy/v0.9.0/en/). There are a few changes to make it more Ruby-like:\n\n- The `get_` prefix has been removed from methods (`topic_words` instead of `get_topic_words`)\n- Methods that return booleans use `?` instead of `is_`  (`live_topic?` instead of `is_live_topic`)\n\nIf a method or option you need isn’t supported, feel free to open an issue.\n\n## Examples\n\n- [LDA](examples/lda_basic.rb)\n- [HDP](examples/hdp_basic.rb)\n\n## Performance\n\ntomoto uses AVX2, AVX, or SSE2 instructions to increase performance on machines that support it. Check which instruction set architecture it’s using with:\n\n```ruby\nTomoto.isa\n```\n\n## Parallelism\n\nChoose a [parallelism algorithm](https://bab2min.github.io/tomotopy/v0.9.0/en/#parallel-sampling-algorithms) with:\n\n```ruby\nmodel.train(parallel: :partition)\n```\n\nSupported values are `:default`, `:none`, `:copy_merge`, and `:partition`.\n\n## History\n\nView the [changelog](https://github.com/ankane/tomoto-ruby/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/tomoto-ruby/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/tomoto-ruby/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone --recursive https://github.com/ankane/tomoto-ruby.git\ncd tomoto-ruby\nbundle install\nbundle exec rake compile\nbundle exec rake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Ftomoto-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Ftomoto-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Ftomoto-ruby/lists"}