{"id":20141879,"url":"https://github.com/yixuan/recosystem","last_synced_at":"2025-10-13T00:27:56.672Z","repository":{"id":18754690,"uuid":"21966996","full_name":"yixuan/recosystem","owner":"yixuan","description":"Recommender System Using Parallel Matrix Factorization","archived":false,"fork":false,"pushed_at":"2023-05-05T10:09:39.000Z","size":409,"stargazers_count":84,"open_issues_count":3,"forks_count":26,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-04T22:46:56.853Z","etag":null,"topics":["matrix-factorization","recommender-system"],"latest_commit_sha":null,"homepage":"http://cran.r-project.org/package=recosystem","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yixuan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2014-07-18T03:53:49.000Z","updated_at":"2025-01-20T10:33:18.000Z","dependencies_parsed_at":"2022-08-30T09:11:46.772Z","dependency_job_id":"c416d4ee-0931-4e3d-973b-4d642d89acba","html_url":"https://github.com/yixuan/recosystem","commit_stats":{"total_commits":262,"total_committers":4,"mean_commits":65.5,"dds":"0.16030534351145043","last_synced_commit":"4081e718af2ac7acf827a6f3ddb394e4ffb48c6a"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixuan%2Frecosystem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixuan%2Frecosystem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixuan%2Frecosystem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixuan%2Frecosystem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yixuan","download_url":"https://codeload.github.com/yixuan/recosystem/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247713258,"owners_count":20983683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matrix-factorization","recommender-system"],"created_at":"2024-11-13T21:59:50.985Z","updated_at":"2025-10-13T00:27:51.636Z","avatar_url":"https://github.com/yixuan.png","language":"C++","readme":"### IMPORTANT NOTES\n\n\u003e The API of this package has changed since version 0.4, due\n\u003e to the API change of LIBMF 2.01 and some other design improvement.\n\n- The `cost` option in `$train()` and `$tune()` has been expanded to and replaced\n  by `costp_l1`, `costp_l2`, `costq_l1`, and `costq_l2`, to allow for more\n  flexibility of the model.\n- A new `loss` parameter in `$train()` and `$tune()` to specify loss function.\n- Data input and output are now managed in a unified way via functions\n  `data_file()`, `data_memory()`, `out_file()`, `out_memory()`, and\n  `out_nothing()`. See section **Data Input and Output** below.\n- As a result, a number of arguments in functions `$tune()`, `$train()`,\n  `$output()`, and `$predict()` now should be objects returned by these\n  input/output functions.\n\n## Recommender System with the recosystem Package\n\n### About This Package\n\n`recosystem` is an R wrapper of the `LIBMF` library developed by\nYu-Chin Juan, Wei-Sheng Chin, Yong Zhuang, Bo-Wen Yuan, Meng-Yuan Yang,\nand Chih-Jen Lin (https://www.csie.ntu.edu.tw/~cjlin/libmf/),\nan open source library for recommender system using parallel matrix\nfactorization.\n\n### Highlights of LIBMF and recosystem\n\n`LIBMF` is a high-performance C++ library for large scale matrix factorization.\n`LIBMF` itself is a parallelized library, meaning that\nusers can take advantage of multicore CPUs to speed up the computation.\nIt also utilizes some advanced CPU features to further improve the performance.\n\n`recosystem` is a wrapper of `LIBMF`, hence it inherits most of the features\nof `LIBMF`, and additionally provides a number of user-friendly R functions to\nsimplify data processing and model building. Also, unlike most other R packages\nfor statistical modeling that store the whole dataset and model object in\nmemory, `LIBMF` (and hence `recosystem`) can significantly reduce memory use,\nfor instance the constructed model that contains information for prediction\ncan be stored in the hard disk, and output result can also be directly\nwritten into a file rather than be kept in memory.\n\n### A Quick View of Recommender System\n\nThe main task of recommender system is to predict unknown entries in the\nrating matrix based on observed values, as is shown in the table below:\n\n|        | item_1 | item_2 | item_3 | ... | item_n |\n|--------|--------|--------|--------|-----|--------|\n| user_1 | 2      | 3      | ??     | ... | 5      |\n| user_2 | ??     | 4      | 3      | ... | ??     |\n| user_3 | 3      | 2      | ??     | ... | 3      |\n| ...    | ...    | ...    | ...    | ... |        |\n| user_m | 1      | ??     | 5      | ... | 4      |\n\nEach cell with number in it is the rating given by some user on a specific\nitem, while those marked with question marks are unknown ratings that need\nto be predicted. In some other literatures, this problem may be named\ncollaborative filtering, matrix completion, matrix recovery, etc.\n\nIn `recosystem`, we provide convenient functions for model training, parameter\ntuning, model exporting, and model prediction.\n\n### Data Input and Output\n\nEach step in the recommender system involves data input and output, as the\ntable below shows:\n\n| Step             | Input             | Output                           |\n|------------------|-------------------|----------------------------------|\n| Model training   | Training data set | --                               |\n| Parameter tuning | Training data set | --                               |\n| Exporting model  | --                | User matrix `P`, item matrix `Q` |\n| Prediction       | Testing data set  | Predicted values                 |\n\nData may have different formats and types of storage, for example the input\ndata set may be saved in a file or stored as R objects, and users may want\nthe output results to be directly written into file or to be returned as R\nobjects for further processing. In `recosystem`, we use two classes,\n`DataSource` and `Output`, to handle data input and output in a unified way.\n\nAn object of class `DataSource` specifies the source of a data set (either\ntraining or testing), which can be created by the following two functions:\n\n- `data_file()`: Specifies a data set from a file in the hard disk\n- `data_memory()`: Specifies a data set from R objects\n- `data_matrix()`: Specifies a data set from a sparse matrix\n\nAnd an object of class `Output` describes how the result should be output,\ntypically returned by the functions below:\n\n- `out_file()`: Result should be saved to a file\n- `out_memory()`: Result should be returned as R objects\n- `out_nothing()`: Nothing should be output\n\nMore data source formats and output options may be supported in the future\nalong with the development of this package.\n\n### Data Format\n\nThe data file for training set needs to be arranged in\nsparse matrix triplet form, i.e., each line in the file contains three\nnumbers\n\n```\nuser_index item_index rating\n```\n\nUser index and item index may start with either 0 or 1, and this can be\nspecified by the `index1` parameter in `data_file()` and `data_memory()`.\nFor example, with `index1 = FALSE`, the training data file for the rating matrix\nin the beginning of this article may look like\n\n```\n0 0 2\n0 1 3\n1 1 4\n1 2 3\n2 0 3\n2 1 2\n...\n```\n\nFrom version 0.4 `recosystem` supports two special types of matrix factorization:\nthe binary matrix factorization (BMF), and the one-class matrix factorization (OCMF).\nBMF requires ratings to take value from `{-1, 1}`, and OCMF requires all the ratings to be positive.\n\nTesting data file is similar to training data, but since the ratings in\ntesting data are usually unknown, the `rating` entry in testing data file\ncan be omitted, or can be replaced by any placeholder such as `0` or `?`.\n\nThe testing data file for the same rating matrix would be\n\n```\n0 2\n1 0\n2 2\n...\n```\n\nExample data files are contained in the `\u003crecosystem\u003e/dat`\n(or `\u003crecosystem\u003e/inst/dat`, for source package) directory.\n\n### Usage of recosystem\n\nThe usage of `recosystem` is quite simple, mainly consisting of the following steps:\n\n1. Create a model object (a Reference Class object in R) by calling `Reco()`.\n2. (Optionally) call the `$tune()` method to select best tuning parameters\nalong a set of candidate values.\n3. Train the model by calling the `$train()` method. A number of parameters\ncan be set inside the function, possibly coming from the result of `$tune()`.\n4. (Optionally) export the model via `$output()`, i.e. write the factorization matrices\n`P` and `Q` into files or return them as R objects.\n5. Use the `$predict()` method to compute predicted values.\n\nBelow is an example on some simulated data:\n\n```r\nlibrary(recosystem)\nset.seed(123) # This is a randomized algorithm\ntrain_set = data_file(system.file(\"dat\", \"smalltrain.txt\", package = \"recosystem\"))\ntest_set  = data_file(system.file(\"dat\", \"smalltest.txt\",  package = \"recosystem\"))\nr = Reco()\nopts = r$tune(train_set, opts = list(dim = c(10, 20, 30), lrate = c(0.1, 0.2),\n                                     costp_l1 = 0, costq_l1 = 0,\n                                     nthread = 1, niter = 10))\nopts\n```\n\n```\n$min\n$min$dim\n[1] 20\n\n$min$costp_l1\n[1] 0\n\n$min$costp_l2\n[1] 0.1\n\n$min$costq_l1\n[1] 0\n\n$min$costq_l2\n[1] 0.01\n\n$min$lrate\n[1] 0.1\n\n$min$loss_fun\n[1] 0.9804937\n\n\n$res\n   dim costp_l1 costp_l2 costq_l1 costq_l2 lrate  loss_fun\n1   10        0     0.01        0     0.01   0.1 0.9996368\n2   20        0     0.01        0     0.01   0.1 1.0040111\n3   30        0     0.01        0     0.01   0.1 0.9967101\n4   10        0     0.10        0     0.01   0.1 0.9930384\n5   20        0     0.10        0     0.01   0.1 0.9804937\n6   30        0     0.10        0     0.01   0.1 0.9921565\n7   10        0     0.01        0     0.10   0.1 0.9857116\n8   20        0     0.01        0     0.10   0.1 1.0006225\n9   30        0     0.01        0     0.10   0.1 0.9891277\n10  10        0     0.10        0     0.10   0.1 0.9826748\n11  20        0     0.10        0     0.10   0.1 0.9807865\n12  30        0     0.10        0     0.10   0.1 0.9863404\n13  10        0     0.01        0     0.01   0.2 1.1022376\n14  20        0     0.01        0     0.01   0.2 1.0266608\n15  30        0     0.01        0     0.01   0.2 1.0039170\n16  10        0     0.10        0     0.01   0.2 1.0734307\n17  20        0     0.10        0     0.01   0.2 1.0393326\n18  30        0     0.10        0     0.01   0.2 1.0003177\n19  10        0     0.01        0     0.10   0.2 1.0769594\n20  20        0     0.01        0     0.10   0.2 1.0323938\n21  30        0     0.01        0     0.10   0.2 1.0061849\n22  10        0     0.10        0     0.10   0.2 1.0365456\n23  20        0     0.10        0     0.10   0.2 1.0023265\n24  30        0     0.10        0     0.10   0.2 1.0044131\n```\n\n```r\nr$train(train_set, opts = c(opts$min, nthread = 1, niter = 10))\n```\n\n```\niter      tr_rmse          obj\n   0       2.2673   5.3765e+04\n   1       1.0267   1.3667e+04\n   2       0.8372   1.0147e+04\n   3       0.7977   9.4773e+03\n   4       0.7703   9.0439e+03\n   5       0.7402   8.5967e+03\n   6       0.7048   8.1202e+03\n   7       0.6609   7.5638e+03\n   8       0.6133   7.0246e+03\n   9       0.5614   6.4770e+03\n```\n\n```r\n## Write predictions to file\npred_file = tempfile()\nr$predict(test_set, out_file(pred_file))\nprint(scan(pred_file, n = 10))\n```\n\n```\n [1] 3.92323 3.05510 2.98484 3.42607 2.53514 2.88135 2.93226 3.11718 2.40406 3.46282\n```\n\n```r\n## Or, directly return an R vector\npred_rvec = r$predict(test_set, out_memory())\nhead(pred_rvec, 10)\n```\n\n```\n [1] 3.923234 3.055096 2.984840 3.426066 2.535142 2.881347 2.932261 3.117176 2.404063\n[10] 3.462822\n```\n\nDetailed help document for each function is available in topics\n`?recosystem::Reco`, `?recosystem::tune`, `?recosystem::train`,\n`?recosystem::output` and `?recosystem::predict`.\n\n### Performance Improvement with Extra Installation Options\n\nTo build `recosystem` from source, one needs a C++ compiler that supports\nthe C++11 standard.\n\nAlso, there are some flags in file `src/Makevars`\n(`src/Makevars.win` for Windows system) that may have influential\neffect on performance. It is strongly suggested to set proper flags\naccording to your type of CPU before compiling the package, in order to\nachieve the best performance:\n\n1. The default `Makevars` provides generic options that should apply to most\nCPUs.\n2. If your CPU supports SSE3\n([a list of supported CPUs](https://en.wikipedia.org/wiki/SSE3)), add\n```\nPKG_CPPFLAGS += -DUSESSE\nPKG_CXXFLAGS += -msse3\n```\n3. If not only SSE3 is supported but also AVX\n([a list of supported CPUs](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)), add\n```\nPKG_CPPFLAGS += -DUSEAVX\nPKG_CXXFLAGS += -mavx\n```\n\nAfter editing the `Makevars` file, run `R CMD INSTALL recosystem` on\nthe package source directory to install `recosystem`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyixuan%2Frecosystem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyixuan%2Frecosystem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyixuan%2Frecosystem/lists"}