Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ankane/libmf-ruby

Large-scale sparse matrix factorization for Ruby
https://github.com/ankane/libmf-ruby

Last synced: about 1 month ago
JSON representation

Large-scale sparse matrix factorization for Ruby

Host: GitHub
URL: https://github.com/ankane/libmf-ruby
Owner: ankane
License: bsd-3-clause
Created: 2019-10-21T01:21:40.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-06-18T07:19:06.000Z (3 months ago)
Last Synced: 2024-07-05T14:53:12.095Z (2 months ago)
Language: Ruby
Homepage:
Size: 91.8 KB
Stars: 30
Watchers: 3
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # LIBMF Ruby

[LIBMF](https://github.com/cjlin1/libmf) - large-scale sparse matrix factorization - for Ruby

Check out [Disco](https://github.com/ankane/disco) for higher-level collaborative filtering

[![Build Status](https://github.com/ankane/libmf-ruby/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/libmf-ruby/actions)

## Installation

Add this line to your application’s Gemfile:

```ruby

gem "libmf"

```

## Getting Started

Prep your data in the format `row_index, column_index, value`

```ruby

data = Libmf::Matrix.new

data.push(0, 0, 5.0)

data.push(0, 2, 3.5)

data.push(1, 1, 4.0)

```

Create a model

```ruby

model = Libmf::Model.new

model.fit(data)

```

Make predictions

```ruby

model.predict(row_index, column_index)

```

Get the latent factors (these approximate the training matrix)

```ruby

model.p_factors

model.q_factors

```

Get the bias (average of all elements in the training matrix)

```ruby

model.bias

```

Save the model to a file

```ruby

model.save("model.txt")

```

Load the model from a file

```ruby

model = Libmf::Model.load("model.txt")

```

Pass a validation set

```ruby

model.fit(data, eval_set: eval_set)

```

## Cross-Validation

Perform cross-validation

```ruby

model.cv(data)

```

Specify the number of folds

```ruby

model.cv(data, folds: 5)

```

## Parameters

Pass parameters - default values below

```ruby

Libmf::Model.new(

  loss: :real_l2,         # loss function

  factors: 8,             # number of latent factors

  threads: 12,            # number of threads used

  bins: 25,               # number of bins

  iterations: 20,         # number of iterations

  lambda_p1: 0,           # coefficient of L1-norm regularization on P

  lambda_p2: 0.1,         # coefficient of L2-norm regularization on P

  lambda_q1: 0,           # coefficient of L1-norm regularization on Q

  lambda_q2: 0.1,         # coefficient of L2-norm regularization on Q

  learning_rate: 0.1,     # learning rate

  alpha: 1,               # importance of negative entries

  c: 0.0001,              # desired value of negative entries

  nmf: false,             # perform non-negative MF (NMF)

  quiet: false            # no outputs to stdout

)

```

### Loss Functions

For real-valued matrix factorization

- `:real_l2` - squared error (L2-norm)

- `:real_l1` - absolute error (L1-norm)

- `:real_kl` - generalized KL-divergence

For binary matrix factorization

- `:binary_log` - logarithmic error

- `:binary_l2` - squared hinge loss

- `:binary_l1` - hinge loss

For one-class matrix factorization

- `:one_class_row` - row-oriented pair-wise logarithmic loss

- `:one_class_col` - column-oriented pair-wise logarithmic loss

- `:one_class_l2` - squared error (L2-norm)

## Metrics

Calculate RMSE (for real-valued MF)

```ruby

model.rmse(data)

```

Calculate MAE (for real-valued MF)

```ruby

model.mae(data)

```

Calculate generalized KL-divergence (for non-negative real-valued MF)

```ruby

model.gkl(data)

```

Calculate logarithmic loss (for binary MF)

```ruby

model.logloss(data)

```

Calculate accuracy (for binary MF)

```ruby

model.accuracy(data)

```

Calculate MPR (for one-class MF)

```ruby

model.mpr(data, transpose)

```

Calculate AUC (for one-class MF)

```ruby

model.auc(data, transpose)

```

## Example

Download the [MovieLens 100K dataset](https://grouplens.org/datasets/movielens/100k/) and use:

```ruby

require "csv"

train_set = Libmf::Matrix.new

valid_set = Libmf::Matrix.new

CSV.foreach("u.data", col_sep: "\t").with_index do |row, i|

  data = i < 80000 ? train_set : valid_set

  data.push(row[0].to_i, row[1].to_i, row[2].to_f)

end

model = Libmf::Model.new(factors: 20)

model.fit(train_set, eval_set: valid_set)

puts model.rmse(valid_set)

```

## Performance

For performance, read data directly from files

```ruby

model.fit("train.txt", eval_set: "validate.txt")

model.cv("train.txt")

```

Data should be in the format `row_index column_index value`:

```txt

0 0 5.0

0 2 3.5

1 1 4.0

```

## Numo

Get latent factors as Numo arrays

```ruby

model.p_factors(format: :numo)

model.q_factors(format: :numo)

```

## Resources

- [LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf_open_source.pdf)

## History

View the [changelog](https://github.com/ankane/libmf-ruby/blob/master/CHANGELOG.md)

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- [Report bugs](https://github.com/ankane/libmf-ruby/issues)

- Fix bugs and [submit pull requests](https://github.com/ankane/libmf-ruby/pulls)

- Write, clarify, or fix documentation

- Suggest or add new features

To get started with development:

```sh

git clone https://github.com/ankane/libmf-ruby.git

cd libmf-ruby

bundle install

bundle exec rake vendor:all

bundle exec rake test

```