{"id":15051481,"url":"https://github.com/teebusch/noah","last_synced_at":"2025-04-10T02:56:23.734Z","repository":{"id":55100844,"uuid":"302384112","full_name":"Teebusch/noah","owner":"Teebusch","description":"An R package for generating pseudonyms that are delightful and easy to remember. It creates adorable anonymous animals like the Likeable Leech and the Proud Chikadee.","archived":false,"fork":false,"pushed_at":"2021-01-19T10:13:41.000Z","size":5624,"stargazers_count":7,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-24T04:23:50.106Z","etag":null,"topics":["package","pseudonymisation","r","rstats"],"latest_commit_sha":null,"homepage":"https://teebusch.github.io/noah/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Teebusch.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-08T15:27:17.000Z","updated_at":"2023-03-09T14:40:38.000Z","dependencies_parsed_at":"2022-08-14T12:00:40.791Z","dependency_job_id":null,"html_url":"https://github.com/Teebusch/noah","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teebusch%2Fnoah","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teebusch%2Fnoah/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teebusch%2Fnoah/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Teebusch%2Fnoah/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Teebusch","download_url":"https://codeload.github.com/Teebusch/noah/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248147680,"owners_count":21055545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["package","pseudonymisation","r","rstats"],"created_at":"2024-09-24T21:35:52.432Z","updated_at":"2025-04-10T02:56:23.706Z","avatar_url":"https://github.com/Teebusch.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n```{r include=FALSE}\nset.seed(122020)\n```\n\n# noah \u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"139\"/\u003e\n\n\u003c!-- badges: start --\u003e\n\n[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing) [![CRAN status](https://www.r-pkg.org/badges/version/noah)](https://CRAN.R-project.org/package=noah) [![R build status](https://github.com/Teebusch/noah/workflows/R-CMD-check/badge.svg)](https://github.com/Teebusch/noah/actions) [![Codecov test coverage](https://codecov.io/gh/Teebusch/noah/branch/master/graph/badge.svg)](https://codecov.io/gh/Teebusch/noah?branch=master)\n\n\u003c!-- badges: end --\u003e\n\nnoah (*no animals were harmed*) generates pseudonyms that are delightful and easy to remember. It creates adorable anonymous animals like the *Likable Leech* and the *Proud Chickadee*.\n\n## Installation\n\nInstall from CRAN with:\n\n```{r eval=FALSE}\ninstall.packages(\"noah\")\n```\n\nOr install the development version from [Github](https://github.com/teebusch/noah) with:\n\n```{r eval=FALSE}\n# install.packages(\"remotes\")\nremotes::install_github(\"teebusch/noah\")\n```\n\n## Usage\n\n### Generate pseudonyms\n\nUse `pseudonymize()` to generate a unique pseudonym for every unique element / row in a vector or data frame. `pseudonymize()` accepts multiple vectors and data frames as arguments, and will pseudonymize them row by row.\n\n```{r}\nlibrary(noah)\n\npseudonymize(1:9)\n\npseudonymize(\n  c(\"🐰\", \"🐰\", \"🐰\"), \n  c(\"🥕\", \"🥕\", \"🍰\")\n)\n```\n\nFor extra delight, we can ask noah to generate only alliterations:\n\n```{r}\npseudonymize(1:9, .alliterate = TRUE)\n```\n\n### Add pseudonyms to a data frame\n\nYou can use `pseudonymize()` with `dplyr::mutate()` to add a column with pseudonyms to a data frame. In this example we use the diabetic retinopathy dataset from the package `survival` and add a new column with a pseudonym for each unique id. We also use `dplyr::relocate()` to move the pseudonyms to the first column:\n\n```{r warning=FALSE, message=FALSE}\nlibrary(dplyr)\ndiabetic \u003c- as_tibble(survival::diabetic)\n\ndiabetic %\u003e% \n  mutate(pseudonym = pseudonymize(id)) %\u003e% \n  relocate(pseudonym)\n```\n\nFor your convenience, noah also provides `add_pseudonyms()`, which wraps `mutate()` and `relocate()` and supports [tidyselect](https://tidyselect.r-lib.org/reference/language.html) syntax for selecting the key columns:\n\n```{r}\ndiabetic %\u003e% \n  add_pseudonyms(id, where(is.factor))\n```\n\n### Keeping track of pseudonyms with an Ark\n\nTo make sure that all pseudonyms are unique and consistent, `pseudonymize()` and `add_pseudonyms()` use an object of class `Ark` (a pseudonym archive). By default, a new `Ark` is created for each function call, but you can also provide an `Ark` yourself. This allows you to keep track of the pseudonyms that have been used and make sure that the same keys always get assigned the same pseudonym:\n\n```{r example-ark}\nark \u003c- Ark$new()\n\n# split dataset into left and right eye and pseudonymize separately\ndiabetic_left \u003c- diabetic %\u003e% \n  filter(eye == \"left\") %\u003e% \n  add_pseudonyms(id, .ark = ark)\n\ndiabetic_right \u003c- diabetic %\u003e% \n  filter(eye == \"right\") %\u003e% \n  add_pseudonyms(id, .ark = ark)\n\n# reunite the data sets again\nbind_rows(diabetic_left, diabetic_right) %\u003e% \n  arrange(id)\n```\n\nThe ark now contains `r length(ark)` pseudonyms -- as many as there are unique id's in the dataset.\n\n```{r}\nlength(unique(diabetic$id))\nlength(ark)\n```\n\n### Customizing an Ark\n\nBuilding your own Ark allows you to customize the name parts that are used to create pseudonyms (by default, adjectives and animals). It also allow you to use names with more than two parts:\n\n```{r}\nark \u003c- Ark$new(parts = list(\n  c(\"Charles\", \"Louis\", \"Henry\", \"George\"),\n  c(\"I\", \"II\", \"III\", \"IV\"),\n  c(\"The Good\", \"The Wise\", \"The Brave\", \"The Mad\", \"The Beloved\")\n))\n\npseudonymize(1:8, .ark = ark)\n```\n\nYou can also configure an `Ark` so that it generates only alliterations. Note that this behavior can still be overridden temporarily by using `.alliterate = FALSE` when you call `pseudonymize()`.\n\n```{r}\nark \u003c- Ark$new(alliterate = TRUE)\n\npseudonymize(1:12, .ark = ark)\n```\n\n## Gotchas\n\nNoah will treat numerically identical whole numbers of type `double` and `integer` as different and give them different pseudonyms. This can cause some unexpected behavior. Consider this example:\n\n```{r echo=TRUE, results='hide', message=FALSE}\nark \u003c- Ark$new()\n\npseudonymize(1:2, .ark = ark)  # creates a vector of integers c(1L, 2L)\npseudonymize(1, .ark = ark)    # creates a double\n```\n\nYou might expect to get 2 different pseudonyms, because in the second `pseudonymize()` you are requesting a pseudonym for the number `1`, which is already in the Ark. Instead you get three pseudonyms:\n\n```{r}\nlength(ark)\n```\n\nNoah will warn you when it thinks you are making this mistake, but it might not catch it all the time. A workaround is to coerce types explicitly, for example by using `as.double()`, `as.integer()`, or `1L` to create integers.\n\n## Related R packages\n\nThere are multiple R packages that generate fake data, including fake names, phone numbers, addresses, credit card numbers, gene sequences and more:\n\n-   [`charlatan`](https://docs.ropensci.org/charlatan/)\n-   [`randomNames`](https://centerforassessment.github.io/randomNames/)\n-   [`randNames`](https://github.com/karthik/randNames)\n-   [`generator`](https://github.com/paulhendricks/generator)\n\nIf you need watertight anonymization you should check out these packages for anonymizing personal identifiable information in data sets:\n\n-   [`sdcMicro`](http://sdctools.github.io/sdcMicro/index.html)\n-   [`sdcTable`](https://sdctools.github.io/sdcTable/index.html)\n-   [`anonymizer`](http://paulhendricks.io/anonymizer/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteebusch%2Fnoah","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteebusch%2Fnoah","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteebusch%2Fnoah/lists"}