{"id":15676774,"url":"https://github.com/ncordon/smartdata","last_synced_at":"2025-05-07T00:28:25.354Z","repository":{"id":73916563,"uuid":"103421253","full_name":"ncordon/smartdata","owner":"ncordon","description":"R package for data preprocessing","archived":false,"fork":false,"pushed_at":"2019-12-18T02:37:24.000Z","size":225,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-31T04:31:54.414Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ncordon.github.io/smartdata","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ncordon.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-13T16:05:12.000Z","updated_at":"2023-01-16T12:52:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"3e8b0f52-6cae-43d6-840c-fc8daa9291b5","html_url":"https://github.com/ncordon/smartdata","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncordon%2Fsmartdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncordon%2Fsmartdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncordon%2Fsmartdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncordon%2Fsmartdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ncordon","download_url":"https://codeload.github.com/ncordon/smartdata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252790272,"owners_count":21804573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T16:05:20.653Z","updated_at":"2025-05-07T00:28:25.316Z","avatar_url":"https://github.com/ncordon.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"README-\"\n)\n```\n[![Build Status](https://travis-ci.com/ncordon/smartdata.svg?branch=master)](https://travis-ci.com/ncordon/smartdata)\n[![minimal R version](https://img.shields.io/badge/R%3E%3D-3.5.0-6666ff.svg)](https://cran.r-project.org/)\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/smartdata)](https://cran.r-project.org/package=smartdata)\n[![packageversion](https://img.shields.io/badge/Package%20version-1.0.2-orange.svg?style=flat-square)](https://github.com/ncordon/smartdata/commits/master)\n\n\n# smartdata\n\nPackage that integrates preprocessing algorithms for oversampling, instance/feature selection, normalization, discretization, space transformation, and outliers/missing values/noise cleaning.\n\n## Installation\n\nYou can install the latest smartdata stable release from CRAN with:\n\n```{r gh-installation, eval = FALSE}\n# This sets both CRAN and Bioconductor as repositories to resolve dependencies\nsetRepositories(ind = 1:2)\ninstall.packages(\"smartdata\")\n```\n\nand load it into an R session with:\n\n```{r results='hide', message=FALSE, warning=FALSE}\nlibrary(\"smartdata\")\n```\n\n## Examples\n\n`smartdata` provides the following wrappers: \n\n* `instance_selection`\n* `feature_selection`\n* `normalize`\n* `discretize`\n* `space_transformation`\n* `clean_outliers`\n* `impute_missing`\n* `clean_noise`\n\nTo get the possible methods available for a certain wrapper, we can do:\n\n```{r options}\nwhich_options(\"instance_selection\")\n```\n\nTo get information about the parameters available for a method:\n\n```{r options_method}\nwhich_options(\"instance_selection\", \"multiedit\")\n```\n\nFirst let's load a bunch of datasets:\n\n```{r data_load, results = \"hide\"}\ndata(iris0,  package = \"imbalance\")\ndata(ecoli1, package = \"imbalance\")\ndata(nhanes, package = \"mice\")\n```\n#### Oversampling\n\n```{r oversample, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_iris \u003c- iris0 %\u003e% oversample(method = \"MWMOTE\", ratio = 0.8, filtering = TRUE)\n```\n\n#### Instance selection\n\n```{r instance_selection, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_iris \u003c- iris %\u003e% instance_selection(\"multiedit\", k = 3, num_folds = 2, \n                                          null_passes = 10, class_attr = \"Species\")\n```\n\n#### Feature selection\n\n```{r feature_selection, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_ecoli \u003c- ecoli1 %\u003e% feature_selection(\"Boruta\", class_attr = \"Class\")\n```\n\n#### Normalization\n\n```{r normalize, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_iris \u003c- iris %\u003e% normalize(\"min_max\", exclude = c(\"Sepal.Length\", \"Species\"))\n```\n\n#### Discretization\n\n```{r discretize, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_iris \u003c- iris %\u003e% discretize(\"ameva\", class_attr = \"Species\")\n```\n\n#### Space transformation\n\n```{r space_transformation, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_ecoli \u003c- ecoli1 %\u003e% space_transformation(\"lle_knn\", k = 3, num_features = 2)\n```\n\n#### Outliers\n\n```{r clean_outliers, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_iris \u003c- iris %\u003e% clean_outliers(\"multivariate\", type = \"adj\")\n```\n\n#### Missing values\n\n```{r impute_missing, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_nhanes \u003c- nhanes %\u003e% impute_missing(\"gibbs_sampling\")\n```\n\n#### Noise\n\n```{r clean_noise, results = \"hide\", message = FALSE, warning = FALSE}\nsuper_iris \u003c- iris %\u003e% clean_noise(\"hybrid\", class_attr = \"Species\", \n                                   consensus = FALSE, action = \"repair\")\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncordon%2Fsmartdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fncordon%2Fsmartdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncordon%2Fsmartdata/lists"}