Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ncordon/smartdata
R package for data preprocessing
https://github.com/ncordon/smartdata
Last synced: 16 days ago
JSON representation
R package for data preprocessing
- Host: GitHub
- URL: https://github.com/ncordon/smartdata
- Owner: ncordon
- License: gpl-2.0
- Created: 2017-09-13T16:05:12.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-12-18T02:37:24.000Z (almost 5 years ago)
- Last Synced: 2024-10-10T16:29:37.824Z (28 days ago)
- Language: R
- Homepage: https://ncordon.github.io/smartdata
- Size: 220 KB
- Stars: 13
- Watchers: 6
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
[![Build Status](https://travis-ci.com/ncordon/smartdata.svg?branch=master)](https://travis-ci.com/ncordon/smartdata)
[![minimal R version](https://img.shields.io/badge/R%3E%3D-3.5.0-6666ff.svg)](https://cran.r-project.org/)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/smartdata)](https://cran.r-project.org/package=smartdata)
[![packageversion](https://img.shields.io/badge/Package%20version-1.0.2-orange.svg?style=flat-square)](https://github.com/ncordon/smartdata/commits/master)# smartdata
Package that integrates preprocessing algorithms for oversampling, instance/feature selection, normalization, discretization, space transformation, and outliers/missing values/noise cleaning.
## Installation
You can install the latest smartdata stable release from CRAN with:
```{r gh-installation, eval = FALSE}
# This sets both CRAN and Bioconductor as repositories to resolve dependencies
setRepositories(ind = 1:2)
install.packages("smartdata")
```and load it into an R session with:
```{r results='hide', message=FALSE, warning=FALSE}
library("smartdata")
```## Examples
`smartdata` provides the following wrappers:
* `instance_selection`
* `feature_selection`
* `normalize`
* `discretize`
* `space_transformation`
* `clean_outliers`
* `impute_missing`
* `clean_noise`To get the possible methods available for a certain wrapper, we can do:
```{r options}
which_options("instance_selection")
```To get information about the parameters available for a method:
```{r options_method}
which_options("instance_selection", "multiedit")
```First let's load a bunch of datasets:
```{r data_load, results = "hide"}
data(iris0, package = "imbalance")
data(ecoli1, package = "imbalance")
data(nhanes, package = "mice")
```
#### Oversampling```{r oversample, results = "hide", message = FALSE, warning = FALSE}
super_iris <- iris0 %>% oversample(method = "MWMOTE", ratio = 0.8, filtering = TRUE)
```#### Instance selection
```{r instance_selection, results = "hide", message = FALSE, warning = FALSE}
super_iris <- iris %>% instance_selection("multiedit", k = 3, num_folds = 2,
null_passes = 10, class_attr = "Species")
```#### Feature selection
```{r feature_selection, results = "hide", message = FALSE, warning = FALSE}
super_ecoli <- ecoli1 %>% feature_selection("Boruta", class_attr = "Class")
```#### Normalization
```{r normalize, results = "hide", message = FALSE, warning = FALSE}
super_iris <- iris %>% normalize("min_max", exclude = c("Sepal.Length", "Species"))
```#### Discretization
```{r discretize, results = "hide", message = FALSE, warning = FALSE}
super_iris <- iris %>% discretize("ameva", class_attr = "Species")
```#### Space transformation
```{r space_transformation, results = "hide", message = FALSE, warning = FALSE}
super_ecoli <- ecoli1 %>% space_transformation("lle_knn", k = 3, num_features = 2)
```#### Outliers
```{r clean_outliers, results = "hide", message = FALSE, warning = FALSE}
super_iris <- iris %>% clean_outliers("multivariate", type = "adj")
```#### Missing values
```{r impute_missing, results = "hide", message = FALSE, warning = FALSE}
super_nhanes <- nhanes %>% impute_missing("gibbs_sampling")
```#### Noise
```{r clean_noise, results = "hide", message = FALSE, warning = FALSE}
super_iris <- iris %>% clean_noise("hybrid", class_attr = "Species",
consensus = FALSE, action = "repair")
```