{"id":3968423,"url":"https://github.com/dselivanov/rsparse","last_synced_at":"2025-05-15T15:06:59.057Z","repository":{"id":39342202,"uuid":"90836821","full_name":"dselivanov/rsparse","owner":"dselivanov","description":"Fast and accurate machine learning on sparse matrices - matrix factorizations, regression, classification, top-N recommendations.","archived":false,"fork":false,"pushed_at":"2025-02-17T01:00:08.000Z","size":1146,"stargazers_count":174,"open_issues_count":4,"forks_count":31,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-05-12T07:52:51.271Z","etag":null,"topics":["collaborative-filtering","factorization-machines","matrix-completion","matrix-factorization","r","recommender-system","sparse-matrices","svd"],"latest_commit_sha":null,"homepage":"https://www.slideshare.net/DmitriySelivanov/matrix-factorizations-for-recommender-systems","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dselivanov.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"custom":["www.rexy.ai"]}},"created_at":"2017-05-10T07:57:16.000Z","updated_at":"2025-05-09T08:43:58.000Z","dependencies_parsed_at":"2023-01-28T17:01:14.953Z","dependency_job_id":"743b5661-626a-4a0b-875d-eb5aaf891cd9","html_url":"https://github.com/dselivanov/rsparse","commit_stats":{"total_commits":294,"total_committers":6,"mean_commits":49.0,"dds":"0.15986394557823125","last_synced_commit":"aa9cf58c915da24ba26eca7319c6c6503eb229f2"},"previous_names":["dselivanov/rsparse","rexyai/rsparse"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Frsparse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Frsparse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Frsparse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Frsparse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dselivanov","download_url":"https://codeload.github.com/dselivanov/rsparse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254364270,"owners_count":22058878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collaborative-filtering","factorization-machines","matrix-completion","matrix-factorization","r","recommender-system","sparse-matrices","svd"],"created_at":"2024-02-08T23:38:11.253Z","updated_at":"2025-05-15T15:06:55.026Z","avatar_url":"https://github.com/dselivanov.png","language":"R","readme":"# rsparse \u003cimg src='man/figures/logo.png' align=\"right\" height=\"128\" /\u003e\n\u003c!-- badges: start --\u003e\n[![R build status](https://github.com/rexyai/rsparse/workflows/R-CMD-check/badge.svg)](https://github.com/dselivanov/rsparse/actions)\n[![codecov](https://codecov.io/gh/rexyai/rsparse/branch/master/graph/badge.svg)](https://app.codecov.io/gh/rexyai/rsparse/branch/master)\n[![License](https://eddelbuettel.github.io/badges/GPL2+.svg)](http://www.gnu.org/licenses/gpl-2.0.html)\n[![Project Status](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing)\n\u003c!-- badges: end --\u003e\n\n`rsparse` is an R package for statistical learning primarily on **sparse matrices** -  **matrix factorizations, factorization machines, out-of-core regression**. Many of the implemented algorithms are particularly useful for **recommender systems** and **NLP**. \n\nWe've paid some attention to the implementation details - we try to avoid data copies, utilize multiple threads via OpenMP and use SIMD where appropriate. Package **allows to work on datasets with millions of rows and millions of columns**.\n\n# Features\n\n### Classification/Regression\n\n1. [Follow the proximally-regularized leader](http://proceedings.mlr.press/v15/mcmahan11b/mcmahan11b.pdf) which allows to solve **very large linear/logistic regression** problems with elastic-net penalty. Solver uses stochastic gradient descent with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See [Ad Click Prediction: a View from the Trenches](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf) for more examples.\n    - Only logistic regerssion implemented at the moment\n    - Native format for matrices is CSR - `Matrix::RsparseMatrix`. However common R `Matrix::CsparseMatrix` (`dgCMatrix`) will be converted automatically.\n1. [Factorization Machines](https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Rendle2010FM.pdf) supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.  \n\n### Matrix Factorizations\n\n1. Vanilla **Maximum Margin Matrix Factorization** - classic approch for \"rating\" prediction. See `WRMF` class and constructor option `feedback = \"explicit\"`. Original paper which indroduced MMMF could be found [here](https://ttic.uchicago.edu/~nati/Publications/MMMFnips04.pdf).\n    * \u003cimg src=\"https://raw.githubusercontent.com/rexyai/rsparse/master/docs/img/MMMF.png\" width=\"400\"\u003e\n1. **Weighted Regularized Matrix Factorization (WRMF)** from [Collaborative Filtering for Implicit Feedback Datasets](http://yifanhu.net/PUB/cf.pdf). See `WRMF` class and constructor option `feedback = \"implicit\"`. \nWe provide 2 solvers:\n    1. Exact based on Cholesky Factorization\n    1. Approximated based on fixed number of steps of **Conjugate Gradient**.\nSee details in [Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering](https://dl.acm.org/doi/10.1145/2043932.2043987) and [Faster Implicit Matrix Factorization](http://www.benfrederickson.com/fast-implicit-matrix-factorization/).\n    * \u003cimg src=\"https://raw.githubusercontent.com/rexyai/rsparse/master/docs/img/WRMF.png\" width=\"400\"\u003e\n1. **Linear-Flow** from [Practical Linear Models for Large-Scale One-Class Collaborative Filtering](http://www.bkveton.com/docs/ijcai2016.pdf). Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to [SLIM](https://ieeexplore.ieee.org/document/6137254))\n    * \u003cimg src=\"https://raw.githubusercontent.com/rexyai/rsparse/master/docs/img/LinearFlow.png\" width=\"300\"\u003e\n1. Fast **Truncated SVD** and **Truncated Soft-SVD** via Alternating Least Squares as described in [Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares](http://arxiv.org/pdf/1410.2596). Works for both sparse and dense matrices. Works on [float](https://github.com/wrathematics/float) matrices as well! For certain problems may be even faster than [irlba](https://github.com/bwlewis/irlba) package.\n    * \u003cimg src=\"https://raw.githubusercontent.com/rexyai/rsparse/master/docs/img/soft-svd.png\" width=\"600\"\u003e\n1. **Soft-Impute** via fast Alternating Least Squares as described in [Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares](https://arxiv.org/pdf/1410.2596).\n    * \u003cimg src=\"https://raw.githubusercontent.com/rexyai/rsparse/master/docs/img/soft-impute.png\" width=\"400\"\u003e\n    * with a solution in SVD form \u003cimg src=\"https://raw.githubusercontent.com/rexyai/rsparse/master/docs/img/soft-impute-svd-form.png\" width=\"150\"\u003e\n1. **GloVe** as described in [GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/pubs/glove.pdf).\n    * This is usually used to train word embeddings, but actually also very useful for recommender systems.\n1. Matrix scaling as descibed in [EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations](http://arxiv.org/pdf/1511.06033)\n\n*********************\n\n_Note: the optimized matrix operations which `rparse` used to offer have been moved to a [separate package](https://github.com/david-cortes/MatrixExtra)_\n\n# Installation \n\nMost of the algorithms benefit from OpenMP and many of them could utilize high-performance implementations of BLAS. If you want to make the maximum out of this package, please read the section below carefully.\n\nIt is recommended to:\n\n1. Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).\n1. Add proper compiler optimizations in your `~/.R/Makevars`. For example on recent processors (with AVX support) and compiler with OpenMP support, the following lines could be a good option:\n\n```\nCXX11FLAGS += -O3 -march=native -fopenmp\nCXXFLAGS   += -O3 -march=native -fopenmp\n```\n\n### Mac OS\n\nIf you are on **Mac** follow the instructions at [https://mac.r-project.org/openmp/](https://mac.r-project.org/openmp/). After `clang` configuration, additionally put a `PKG_CXXFLAGS += -DARMA_USE_OPENMP` line in your `~/.R/Makevars`. After that, install `rsparse` in the usual way. \n\nAlso we recommend to use [vecLib](https://developer.apple.com/documentation/accelerate/veclib) - Apple’s implementations of BLAS.\n\n```sh\nln -sf  /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib /Library/Frameworks/R.framework/Resources/lib/libRblas.dylib\n```\n\n### Linux\n\nOn Linux, it's enough to just create this file if it doesn't exist (`~/.R/Makevars`).\n\nIf using OpenBLAS, it is highly recommended to use the `openmp` variant rather than the `pthreads` variant. On Linux, it is usually available as a separate package in typical distribution package managers (e.g. for Debian, it can be obtained by installing `libopenblas-openmp-dev`, which is not the default version), and if there are multiple BLASes installed, can be set as the default through the [Debian alternatives system](https://wiki.debian.org/DebianScience/LinearAlgebraLibraries) - which can also be used [for MKL](https://stackoverflow.com/a/49842944/5941695).\n\n### Windows\n\nBy default, R for Windows comes with unoptimized BLAS and LAPACK libraries, and `rsparse` will prefer using Armadillo's replacements instead. In order to use BLAS, **install `rsparse` from source** (not from CRAN), removing the option `-DARMA_DONT_USE_BLAS` from `src/Makevars.win` and ideally adding `-march=native` (under `PKG_CXXFLAGS`). See [this tutorial](https://github.com/david-cortes/R-openblas-in-windows) for instructions on getting R for Windows to use OpenBLAS. Alternatively, Microsoft's MRAN distribution for Windows comes with MKL.\n\n# Materials\n\n**Note that syntax is these posts/slides is not up to date since package was under active development**\n\n1. [Slides from DataFest Tbilisi(2017-11-16)](https://www.slideshare.net/DmitriySelivanov/matrix-factorizations-for-recommender-systems)\n\nHere is example of `rsparse::WRMF` on [lastfm360k](https://www.upf.edu/web/mtg/lastfm360k) dataset in comparison with other good implementations:\n\n\u003cimg src=\"https://github.com/dselivanov/bench-wals/raw/master/img/wals-bench-cg.png\" width=\"600\"\u003e\n\n\n# API\n\nWe follow [mlapi](https://github.com/dselivanov/mlapi) conventions.\n\n# Release and configure\n\n## Making release\n\nDon't forget to add `DARMA_NO_DEBUG` to `PKG_CXXFLAGS` to skip bound checks (this has significant impact on NNLS solver)\n\n```\nPKG_CXXFLAGS = ... -DARMA_NO_DEBUG\n```\n\n## Configure\n\nGenerate configure:\n\n```sh\nautoconf configure.ac \u003e configure \u0026\u0026 chmod +x configure\n```\n","funding_links":["www.rexy.ai"],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdselivanov%2Frsparse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdselivanov%2Frsparse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdselivanov%2Frsparse/lists"}