{"id":15655205,"url":"https://github.com/mayer79/outforest","last_synced_at":"2025-08-02T20:05:17.107Z","repository":{"id":44674045,"uuid":"229068718","full_name":"mayer79/outForest","owner":"mayer79","description":"Outlier detection based on random forest models","archived":false,"fork":false,"pushed_at":"2025-04-06T09:50:11.000Z","size":2761,"stargazers_count":13,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-13T02:06:23.262Z","etag":null,"topics":["machine-learning","outlier","outlier-analysis","outlier-detection","random-forest","rstats"],"latest_commit_sha":null,"homepage":"https://mayer79.github.io/outForest/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mayer79.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-19T14:14:12.000Z","updated_at":"2025-04-06T09:47:42.000Z","dependencies_parsed_at":"2022-09-07T03:03:58.059Z","dependency_job_id":"4c1e0c8f-406b-43ba-b849-ed65e3e47440","html_url":"https://github.com/mayer79/outForest","commit_stats":{"total_commits":70,"total_committers":2,"mean_commits":35.0,"dds":0.02857142857142858,"last_synced_commit":"26ad60bb7955870827866f9d3d126c2bc4c09699"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/mayer79/outForest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2FoutForest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2FoutForest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2FoutForest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2FoutForest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mayer79","download_url":"https://codeload.github.com/mayer79/outForest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2FoutForest/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268448176,"owners_count":24251994,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","outlier","outlier-analysis","outlier-detection","random-forest","rstats"],"created_at":"2024-10-03T12:57:00.856Z","updated_at":"2025-08-02T20:05:17.075Z","avatar_url":"https://github.com/mayer79.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# {outForest} \u003ca href='https://github.com/mayer79/outForest'\u003e\u003cimg src='man/figures/logo.png' align=\"right\" height=\"139\" /\u003e\u003c/a\u003e\n\n\u003c!-- badges: start --\u003e\n\n[![R-CMD-check](https://github.com/mayer79/outForest/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mayer79/outForest/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/mayer79/outForest/graph/badge.svg)](https://app.codecov.io/gh/mayer79/outForest)\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/outForest)](https://cran.r-project.org/package=outForest)\n\n[![](https://cranlogs.r-pkg.org/badges/outForest)](https://cran.r-project.org/package=outForest) \n[![](https://cranlogs.r-pkg.org/badges/grand-total/outForest?color=orange)](https://cran.r-project.org/package=outForest)\n\n\u003c!-- badges: end --\u003e\n\n## Overview\n\n{outForest} is a multivariate anomaly detection method. Each numeric variable is regressed onto all other variables using a random forest. If the scaled absolute difference between observed value and out-of-bag prediction is larger than a prespecified threshold, then a value is considered an outlier. After identification of outliers, they can be replaced, e.g., by predictive mean matching from the non-outliers.\n\nThe method can be viewed as a multivariate extension of a basic univariate outlier detection method, in which a value is considered an outlier if it deviates from the mean by more than, say, three times the standard deviation. In the multivariate case, instead of comparing a value with the *overall mean*, rather the difference to the *conditional mean* is considered. {outForest} estimates this conditional mean by a random forest.\n\nOnce the method is trained on a reference data set, it can be applied to new data.\n\n## Installation\n\n```r\n# From CRAN\ninstall.packages(\"outForest\")\n\n# Development version\ndevtools::install_github(\"mayer79/outForest\")\n```\n\n## Usage\n\nWe first generate a data set with about 2% outliers values in each numeric column. Then, we try to identify them.\n\n``` r\nlibrary(outForest)\nset.seed(3)\n\n# Generate data with outliers in numeric columns\nhead(irisWithOutliers \u003c- generateOutliers(iris, p = 0.02))\n\n# Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n#          5.1    3.500000          1.4         0.2  setosa\n#          4.9    3.000000          1.4         0.2  setosa\n#          4.7    3.200000          1.3         0.2  setosa\n#          4.6    3.100000          1.5         0.2  setosa\n#          5.0   -3.744405          1.4         0.2  setosa\n#          5.4    3.900000          1.7         0.4  setosa\n \n# Find outliers by random forest regressions and replace them by predictive mean matching\n(out \u003c- outForest(irisWithOutliers, allow_predictions = TRUE))\n\n# Plot the number of outliers per numeric variable\nplot(out)\n\n# Information on outliers\nhead(outliers(out))\n\n# row          col  observed predicted      rmse     score threshold replacement\n#   5  Sepal.Width -3.744405  3.298493 0.7810172 -9.017596         3         2.8\n#  20 Sepal.Length 10.164017  5.141093 0.6750468  7.440852         3         5.4\n# 138  Petal.Width  4.721186  2.113464 0.3712539  7.024092         3         2.1\n#  68  Petal.Width -1.188913  1.305339 0.3712539 -6.718452         3         1.2\n# 137  Sepal.Width  8.054524  2.861445 0.7810172  6.649122         3         2.9\n#  15 Petal.Length  6.885277  1.875646 0.7767877  6.449163         3         1.3\n\n# Resulting data set with replaced outliers\nhead(Data(out))\n\n# Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n#          5.1         3.5          1.4         0.2  setosa\n#          4.9         3.0          1.4         0.2  setosa\n#          4.7         3.2          1.3         0.2  setosa\n#          4.6         3.1          1.5         0.2  setosa\n#          5.0         2.8          1.4         0.2  setosa\n#          5.4         3.9          1.7         0.4  setosa\n\n# Out-of-sample application\niris1 \u003c- iris[1, ]\niris1$Sepal.Length \u003c- -1\npred \u003c- predict(out, newdata = iris1)\n\n# Did we find the outlier?\noutliers(pred)\n\n# row          col observed predicted      rmse    score threshold replacement\n#   1 Sepal.Length       -1  4.960069 0.6750468 -8.82912         3         6.4\n\n# Fixed data\nData(pred)\n\n# Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n#          6.4         3.5          1.4         0.2  setosa\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmayer79%2Foutforest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmayer79%2Foutforest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmayer79%2Foutforest/lists"}