Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/paulhendricks/anonymizer
Anonymize data containing Personally Identifiable Information (PII) in R
https://github.com/paulhendricks/anonymizer
Last synced: 10 days ago
JSON representation
Anonymize data containing Personally Identifiable Information (PII) in R
- Host: GitHub
- URL: https://github.com/paulhendricks/anonymizer
- Owner: paulhendricks
- License: other
- Created: 2015-08-21T13:57:13.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-01T18:24:32.000Z (over 7 years ago)
- Last Synced: 2024-08-06T03:04:15.737Z (3 months ago)
- Language: R
- Homepage:
- Size: 164 KB
- Stars: 71
- Watchers: 3
- Forks: 9
- Open Issues: 4
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - paulhendricks/anonymizer - Anonymize data containing Personally Identifiable Information (PII) in R (R)
README
---
output:
github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```# anonymizer
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/anonymizer)](http://cran.r-project.org/package=anonymizer)
[![Downloads from the RStudio CRAN mirror](http://cranlogs.r-pkg.org/badges/anonymizer)](http://cran.rstudio.com/package=anonymizer)
[![Build Status](https://travis-ci.org/paulhendricks/anonymizer.png?branch=master)](https://travis-ci.org/paulhendricks/anonymizer)
[![Build status](https://ci.appveyor.com/api/projects/status/qu5j8q9wvit2i3pe/branch/master?svg=true)](https://ci.appveyor.com/project/paulhendricks/anonymizer/branch/master)
[![codecov.io](http://codecov.io/github/paulhendricks/anonymizer/coverage.svg?branch=master)](http://codecov.io/github/paulhendricks/anonymizer?branch=master)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/0.1.0/active.svg)](http://www.repostatus.org/#active)`anonymizer` [anonymizes](https://en.wikipedia.org/wiki/Data_anonymization) data containing [Personally Identifiable Information](https://en.wikipedia.org/wiki/Personally_identifiable_information) (PII) using a combination of [salting](https://en.wikipedia.org/wiki/Salt_%28cryptography%29) and [hashing](https://en.wikipedia.org/wiki/Hash_function). You can find quality examples of data anonymization in R [here](http://jangorecki.github.io/blog/2014-11-07/Data-Anonymization-in-R.html), [here](http://stackoverflow.com/questions/10454973/how-to-create-example-data-set-from-private-data-replacing-variable-names-and-l), and [here](http://4dpiecharts.com/2011/08/23/anonymising-data/).
## Installation
You can install the latest development version from CRAN:
```R
install.packages("anonymizer")
````Or from GitHub with:
```R
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
devtools::install_github("paulhendricks/anonymizer")
```If you encounter a clear bug, please file a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on [GitHub](https://github.com/paulhendricks/anonymizer/issues).
## API
`anonymizer` employs four convenience functions: `salt`, `unsalt`, `hash`, and `anonymize`.
```{r}
library(dplyr, warn.conflicts = FALSE)
library(anonymizer)
letters %>% head
letters %>% head %>% salt(.seed = 1)
letters %>% head %>% salt(.seed = 1) %>% unsalt(.seed = 1)
letters %>% head %>% hash(.algo = "crc32")
letters %>% head %>% salt(.seed = 1) %>% hash(.algo = "crc32")
letters %>% head %>% anonymize(.algo = "crc32", .seed = 1)
```### Generate data containing fake PII
```{r}
library(generator)
n <- 6
set.seed(1)
ashley_madison <-
data.frame(name = r_full_names(n),
snn = r_national_identification_numbers(n),
dob = r_date_of_births(n),
email = r_email_addresses(n),
ip = r_ipv4_addresses(n),
phone = r_phone_numbers(n),
credit_card = r_credit_card_numbers(n),
lat = r_latitudes(n),
lon = r_longitudes(n),
stringsAsFactors = FALSE)
knitr::kable(ashley_madison, format = "markdown")
```### Detect data containing PII
```{r}
library(detector)
ashley_madison %>%
detect %>%
knitr::kable(format = "markdown")
```### Anonymize data containing PII
```{r}
ashley_madison[] <- lapply(ashley_madison, anonymize, .algo = "crc32")
ashley_madison %>%
knitr::kable(format = "markdown")
```## Citation
To cite package ‘anonymizer’ in publications use:
```
Paul Hendricks (2015). anonymizer: Anonymize Data Containing Personally Identifiable Information. R package version 0.2.0. https://github.com/paulhendricks/anonymizer
```A BibTeX entry for LaTeX users is
```
@Manual{,
title = {anonymizer: Anonymize Data Containing Personally Identifiable Information},
author = {Paul Hendricks},
year= {2015},
note = {R package version 0.2.0},
url = {https://github.com/paulhendricks/anonymizer},
}
```