Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/markvanderloo/simputation
Making imputation easy
https://github.com/markvanderloo/simputation
data-science imputation officialstatistics r rstats
Last synced: 3 months ago
JSON representation
Making imputation easy
- Host: GitHub
- URL: https://github.com/markvanderloo/simputation
- Owner: markvanderloo
- License: gpl-3.0
- Created: 2016-07-20T14:58:27.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-08-02T12:08:55.000Z (6 months ago)
- Last Synced: 2024-10-03T07:12:52.327Z (4 months ago)
- Topics: data-science, imputation, officialstatistics, r, rstats
- Language: R
- Homepage:
- Size: 738 KB
- Stars: 89
- Watchers: 4
- Forks: 11
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - markvanderloo/simputation - Making imputation easy (R)
README
[![CRAN](http://www.r-pkg.org/badges/version/simputation)](https://CRAN.R-project.org/package=simputation)[![status](https://tinyverse.netlify.app/badge/simputation)](https://CRAN.R-project.org/package=simputation)
[![Downloads](http://cranlogs.r-pkg.org/badges/simputation)](https://CRAN.R-project.org/package=simputation)[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)# simputation
An R package to make imputation simple. Currently supported methods include- Model based (optionally add [non-]parametric random residual)
- linear regression
- robust linear regression (M-estimation)
- ridge/elasticnet/lasso regression (from version >= 0.2.1)
- CART models
- Random forest
- Model based, multivariate
- Imputation based on EM-estimated parameters (from version >= 0.2.1)
- [missForest](https://CRAN.R-project.org/package=missForest) (from version >= 0.2.1)
- Donor imputation (including various donor pool specifications)
- k-nearest neigbour (based on [gower](https://cran.r-project.org/package=gower)'s distance)
- sequential hotdeck (LOCF, NOCB)
- random hotdeck
- Predictive mean matching
- Other
- (groupwise) median imputation (optional random residual)
- Proxy imputation (copy from other variable)### Installation
To install simputation and all packages needed to support various imputation
models do the following.
```r
install.packages("simputation", dependencies=TRUE)
```To install the development version.
```{bash}
git clone https://github.com/markvanderloo/simputation
make install
```### Example usage
Create some data suffering from missings
```r
library(simputation) # current packagedat <- iris
# empty a few fields
dat[1:3,1] <- dat[3:7,2] <- dat[8:10,5] <- NA
head(dat,10)
```
Now impute `Sepal.Length` and `Sepal.Width` by regression on `Petal.Length` and `Species`, and impute `Species` using a CART model, that uses all other variables (including the imputed variables in this case).
```r
dat |>
impute_lm(Sepal.Length + Sepal.Width ~ Petal.Length + Species) |>
impute_cart(Species ~ .) |> # use all variables except 'Species' as predictor
head(10)
```### Materials
- The introductory [vignette](https://cran.r-project.org/web/packages/simputation/vignettes/intro.html)
- [slides](https://markvanderloo.eu/files/share/loo2017easy.pdf) from my [useR2017](https://user2017.brussels/) talk.