https://github.com/paulhendricks/detector
Detect data containing Personally Identifiable Information (PII) in R
https://github.com/paulhendricks/detector
Last synced: 13 days ago
JSON representation
Detect data containing Personally Identifiable Information (PII) in R
- Host: GitHub
- URL: https://github.com/paulhendricks/detector
- Owner: paulhendricks
- License: other
- Created: 2015-08-21T15:37:02.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2017-07-01T19:57:12.000Z (about 8 years ago)
- Last Synced: 2025-06-09T12:50:01.806Z (about 1 month ago)
- Language: R
- Homepage:
- Size: 45.9 KB
- Stars: 15
- Watchers: 1
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output:
github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```# detector
[](http://cran.r-project.org/package=detector)
[](http://cran.rstudio.com/package=detector)
[](https://travis-ci.org/paulhendricks/detector)
[](https://ci.appveyor.com/project/paulhendricks/detector/branch/master)
[](http://codecov.io/github/paulhendricks/detector?branch=master)
[](http://www.repostatus.org/#active)`detector` makes detecting data containing [Personally Identifiable Information](https://en.wikipedia.org/wiki/Personally_identifiable_information) (PII) quick, easy, and scalable. It provides high-level functions that can take vectors and data.frames and return important summary statistics in a convenient data.frame. Once complete, `detector` will be able to detect the following types of PII:
* Full name
* Home address
* E-mail address
* National identification number
* Passport number
* Social Security number
* IP address
* Vehicle registration plate number
* Driver's license number
* Credit card number
* Date of birth
* Birthplace
* Telephone number
* Latitude and longtiude## State of the Union
### Complete!
* E-mail address
* Telephone number
* National identification number### Needs more work...
* Credit card number
### Haven't even started :(
* Full name
* Date of birth
* Home address
* IP address
* Vehicle registration plate number
* Driver's license number
* Birthplace
* Latitude and longtiude## Installation
You can install the latest development version from CRAN:
```R
install.packages("detector")
````Or from GitHub with:
```R
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
devtools::install_github("paulhendricks/detector")
```If you encounter a clear bug, please file a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on [GitHub](https://github.com/paulhendricks/detector/issues).
## API
### Generate data containing fake PII
```{r}
library(dplyr, warn.conflicts = FALSE)
library(generator)
n <- 6
set.seed(1)
ashley_madison <-
data.frame(name = r_full_names(n),
snn = r_national_identification_numbers(n),
dob = r_date_of_births(n),
email = r_email_addresses(n),
ip = r_ipv4_addresses(n),
phone = r_phone_numbers(n),
credit_card = r_credit_card_numbers(n),
lat = r_latitudes(n),
lon = r_longitudes(n),
stringsAsFactors = FALSE)
knitr::kable(ashley_madison, format = "markdown")
```### Detect data containing PII
```{r}
library(detector)
ashley_madison %>%
detect %>%
knitr::kable(format = "markdown")
```## Citation
To cite package ‘detector’ in publications use:
```
Paul Hendricks (2015). detector: Detect Data Containing Personally Identifiable Information. R package version 0.1.0. https://CRAN.R-project.org/package=detector
```A BibTeX entry for LaTeX users is
```
@Manual{,
title = {detector: Detect Data Containing Personally Identifiable Information},
author = {Paul Hendricks},
year = {2015},
note = {R package version 0.1.0},
url = {https://CRAN.R-project.org/package=detector},
}
```