https://github.com/ankane/tomoto

High performance topic modeling for Ruby
https://github.com/ankane/tomoto

latent-dirichlet-allocation lda topic-modeling

Last synced: 3 months ago
JSON representation

High performance topic modeling for Ruby

Host: GitHub
URL: https://github.com/ankane/tomoto
Owner: ankane
License: mit
Created: 2020-10-09T22:00:01.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2024-12-30T02:40:36.000Z (4 months ago)
Last Synced: 2025-01-25T11:05:11.675Z (3 months ago)
Topics: latent-dirichlet-allocation, lda, topic-modeling
Language: C++
Homepage:
Size: 135 KB
Stars: 64
Watchers: 4
Forks: 3
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt

Awesome Lists containing this project

awesome-topic-models - tomoto - Ruby extension for Gibbs sampling based *tomoto* which is written in C++ ![GitHub Repo stars](https://img.shields.io/github/stars/ankane/tomoto?style=social) (Libraries & Toolkits)

README

        # tomoto.rb

:tomato: [tomoto](https://github.com/bab2min/tomotopy) - high performance topic modeling - for Ruby

[![Build Status](https://github.com/ankane/tomoto-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/tomoto-ruby/actions)

## Installation

Add this line to your application’s Gemfile:

```ruby

gem "tomoto"

```

## Getting Started

Train a model

```ruby

model = Tomoto::LDA.new(k: 2)

model.add_doc(["tokens", "from", "document", "one"])

model.add_doc(["tokens", "from", "document", "two"])

model.add_doc(["tokens", "from", "document", "three"])

model.train(100) # iterations

```

Get the summary

```ruby

model.summary

```

Get topic words

```ruby

model.topic_words

```

Save the model to a file

```ruby

model.save("model.bin")

```

Load the model from a file

```ruby

model = Tomoto::LDA.load("model.bin")

```

Get topic probabilities for a document

```ruby

doc = model.docs[0]

doc.topics

```

Get the number of words for each topic

```ruby

model.count_by_topics

```

Get the vocab

```ruby

model.vocabs

```

Get the log likelihood per word

```ruby

model.ll_per_word

```

Perform inference for unseen documents

```ruby

doc = model.make_doc(["unseen", "doc"])

topic_dist, ll = model.infer(doc)

```

## Models

Supports:

- Latent Dirichlet Allocation (`LDA`)

- Labeled LDA (`LLDA`)

- Partially Labeled LDA (`PLDA`)

- Supervised LDA (`SLDA`)

- Dirichlet Multinomial Regression (`DMR`)

- Generalized Dirichlet Multinomial Regression (`GDMR`)

- Hierarchical Dirichlet Process (`HDP`)

- Hierarchical LDA (`HLDA`)

- Multi Grain LDA (`MGLDA`)

- Pachinko Allocation (`PA`)

- Hierarchical PA (`HPA`)

- Correlated Topic Model (`CT`)

- Dynamic Topic Model (`DT`)

## API

This library follows the [tomotopy API](https://bab2min.github.io/tomotopy/v0.9.0/en/). There are a few changes to make it more Ruby-like:

- The `get_` prefix has been removed from methods (`topic_words` instead of `get_topic_words`)

- Methods that return booleans use `?` instead of `is_`  (`live_topic?` instead of `is_live_topic`)

If a method or option you need isn’t supported, feel free to open an issue.

## Examples

- [LDA](examples/lda_basic.rb)

- [HDP](examples/hdp_basic.rb)

## Performance

tomoto uses AVX2, AVX, or SSE2 instructions to increase performance on machines that support it. Check which instruction set architecture it’s using with:

```ruby

Tomoto.isa

```

## Parallelism

Choose a [parallelism algorithm](https://bab2min.github.io/tomotopy/v0.9.0/en/#parallel-sampling-algorithms) with:

```ruby

model.train(parallel: :partition)

```

Supported values are `:default`, `:none`, `:copy_merge`, and `:partition`.

## History

View the [changelog](https://github.com/ankane/tomoto-ruby/blob/master/CHANGELOG.md)

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- [Report bugs](https://github.com/ankane/tomoto-ruby/issues)

- Fix bugs and [submit pull requests](https://github.com/ankane/tomoto-ruby/pulls)

- Write, clarify, or fix documentation

- Suggest or add new features

To get started with development:

```sh

git clone --recursive https://github.com/ankane/tomoto-ruby.git

cd tomoto-ruby

bundle install

bundle exec rake compile

bundle exec rake test

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ankane/tomoto

Awesome Lists containing this project

README