https://github.com/mamba413/eimpute

Efficiently Impute Large Scale Incomplete Matrix
https://github.com/mamba413/eimpute

Last synced: about 1 year ago
JSON representation

Efficiently Impute Large Scale Incomplete Matrix

Host: GitHub
URL: https://github.com/mamba413/eimpute
Owner: Mamba413
License: other
Created: 2020-03-11T12:34:23.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-01-16T16:20:21.000Z (over 2 years ago)
Last Synced: 2025-01-22T09:52:23.006Z (over 1 year ago)
Language: C++
Homepage:
Size: 4.09 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

# eimpute: Efficiently IMPUTE Large Scale Incomplete Matrix

Introdution
----------
Matrix completion is a procedure for imputing the missing elements in matrices by using the information of observed elements. This procedure can be visualized as:

![](./vignettes/matrixcom.jpg)

Matrix completion has attracted a lot of attention, it is widely applied in:
- tabular data imputation: recover the missing elements in data table;
- recommend system: estimate users' potantial preference for items pending purchased;
- image inpainting: inpaint the missing elements in digit images.

Software
----------
A computationally efficient R package, **eimpute** is developed for matrix completion.

### Installation
Install the stable version from CRAN:
```R
install.packages("eimpute")
```

### Advantage
In **eimpute**, matrix completion problem is solved by iteratively performing low-rank approximation and data calibration, which enjoy two admirable advantages:
- unbiased low-rank approximation for incomplete matrix
- less time consumption via truncated SVD
Moreover, **eimpute** also supports flexible data standardization.

Compare **eimpute** and **softimpute** in systhesis datasets $X_{m \times m}$ with $p$ proportion missing observations:

- $m$ is chosen as 1000, 2000, 3000, 4000
- $p$ is chosen as 0.1, 0.5, 0.9.

![](./vignettes/time3.png)
![](./vignettes/error3.png)

In high dimension case, als method in **softimpute** is a little faster than **eimpute** in low proportion of missing observations, as the proportion of missing observations increase, rsvd method in **eimpute** have a better performance than **softimpute** in time cost and test error. Compare with two method in **eimpute*, rsvd method is better than tsvd in time cost.

References
----------
- Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectra Regularization Algorithms for Learning Large Incomplete Matrices, Journal of Machine Learning Research 11 (2010) 2287-2322

- Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2011) Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Review Vol. 53, num. 2, pp. 217-288

Bug report
----------
Send an email to Zhe Gao at gaozh8@mail2.sysu.edu.cn

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mamba413/eimpute

Awesome Lists containing this project

README