{"id":14069180,"url":"https://github.com/data-cleaning/errorlocate","last_synced_at":"2026-02-22T19:04:52.899Z","repository":{"id":41176487,"uuid":"38886469","full_name":"data-cleaning/errorlocate","owner":"data-cleaning","description":"Find and replace erroneous fields in data using validation rules","archived":false,"fork":false,"pushed_at":"2025-12-10T12:52:59.000Z","size":10267,"stargazers_count":22,"open_issues_count":13,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-01-28T04:16:08.618Z","etag":null,"topics":["data-cleaning","errors","invalidation","r"],"latest_commit_sha":null,"homepage":"http://data-cleaning.github.io/errorlocate/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/data-cleaning.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2015-07-10T15:06:17.000Z","updated_at":"2025-12-10T12:50:47.000Z","dependencies_parsed_at":"2024-06-09T13:29:02.951Z","dependency_job_id":"c460d76c-35cb-41a7-b2c7-7a5c53a14a24","html_url":"https://github.com/data-cleaning/errorlocate","commit_stats":{"total_commits":270,"total_committers":2,"mean_commits":135.0,"dds":"0.0037037037037036535","last_synced_commit":"44a7bbcae6f5c5f6ca5bcc9d78c2af7e351fa5d2"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/data-cleaning/errorlocate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/data-cleaning%2Ferrorlocate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/data-cleaning%2Ferrorlocate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/data-cleaning%2Ferrorlocate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/data-cleaning%2Ferrorlocate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/data-cleaning","download_url":"https://codeload.github.com/data-cleaning/errorlocate/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/data-cleaning%2Ferrorlocate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29723574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T15:10:41.462Z","status":"ssl_error","status_checked_at":"2026-02-22T15:10:04.636Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","errors","invalidation","r"],"created_at":"2024-08-13T07:06:41.517Z","updated_at":"2026-02-22T19:04:52.894Z","avatar_url":"https://github.com/data-cleaning.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n[![R build status](https://github.com/data-cleaning/errorlocate/workflows/R-CMD-check/badge.svg)](https://github.com/data-cleaning/errorlocate/actions)\n[![CRAN](http://www.r-pkg.org/badges/version/errorlocate)](https://CRAN.R-project.org/package=errorlocate)\n[![Downloads](http://cranlogs.r-pkg.org/badges/errorlocate)](http://www.r-pkg.org/pkg/errorlocate) \n[![Codecov test coverage](https://codecov.io/gh/data-cleaning/errorlocate/branch/master/graph/badge.svg)](https://codecov.io/gh/data-cleaning/errorlocate?branch=master)\n[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)\n\n# Error localization\n\nFind errors in data given a set of validation rules.\nThe `errorlocate` helps to identify obvious errors in raw datasets.\n\nIt works in tandem with the package `validate`.\nWith `validate` you formulate data validation rules to which the data must comply.\n\nFor example:\n\n- \"age cannot be negative\": `age \u003e= 0`.\n- \"if a person is married, he must be older then 16 years\": `if (married ==TRUE) age \u003e 16`.\n- \"Profit is turnover minus cost\": `profit == turnover - cost`.\n\nWhile `validate` can check if a record is valid or not, it does not identify\nwhich of the variables are responsible for the invalidation. This may seem a simple task,\nbut is actually quite tricky:  a set of validation rules forms a web\nof dependent variables: changing the value of an invalid record to repair for rule 1, may invalidate\nthe record for rule 2.\n\n`errorlocate` provides a small framework for record based error detection and implements the Fellegi Holt\nalgorithm. This algorithm assumes there is no other information available then the values of a record\nand a set of validation rules. The algorithm minimizes the (weighted) number of values that need\nto be adjusted to remove the invalidation.\n\n# Installation\n\n`errorlocate` can be installed from CRAN:\n\n```r\ninstall.packages(\"errorlocate\")\n```\n\nBeta versions can be installed with `drat`:\n\n```r\ndrat::addRepo(\"data-cleaning\")\ninstall.packages(\"errorlocate\")\n```\n\nThe latest development version of `errorlocate` can be installed from github with `devtools`:\n\n```r\ndevtools::install_github(\"data-cleaning/errorlocate\")\n```\n\n# Usage\n\n```{r}\nlibrary(errorlocate)\nrules \u003c- validator( profit == turnover - cost\n                  , cost \u003e= 0.6 * turnover\n                  , turnover \u003e= 0\n                  , cost \u003e= 0 # is implied\n)\n\ndata \u003c- data.frame(profit=750, cost=125, turnover=200)\n\ndata_no_error \u003c- replace_errors(data, rules)\n\n# faulty data was replaced with NA\nprint(data_no_error)\n\ner \u003c- errors_removed(data_no_error)\n\nprint(er)\n\nsummary(er)\n\ner$errors\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdata-cleaning%2Ferrorlocate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdata-cleaning%2Ferrorlocate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdata-cleaning%2Ferrorlocate/lists"}