{"id":14384110,"url":"https://github.com/smithoss/gonymizer","last_synced_at":"2026-01-14T17:03:13.807Z","repository":{"id":35002954,"uuid":"174196630","full_name":"smithoss/gonymizer","owner":"smithoss","description":"Gonymizer: A Tool to Anonymize Sensitive PostgreSQL Data Tables  for Use in QA and Testing","archived":false,"fork":false,"pushed_at":"2025-03-27T00:26:01.000Z","size":21690,"stargazers_count":158,"open_issues_count":31,"forks_count":37,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-08-23T19:45:46.156Z","etag":null,"topics":["anonymized-database","anonymizer","database-administrators","golang","hipaa","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smithoss.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-03-06T18:15:31.000Z","updated_at":"2025-07-24T15:28:15.000Z","dependencies_parsed_at":"2023-11-14T16:41:17.823Z","dependency_job_id":"bb1d94cc-a188-49dc-ab9e-1a7e14b02cd0","html_url":"https://github.com/smithoss/gonymizer","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/smithoss/gonymizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithoss%2Fgonymizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithoss%2Fgonymizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithoss%2Fgonymizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithoss%2Fgonymizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smithoss","download_url":"https://codeload.github.com/smithoss/gonymizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smithoss%2Fgonymizer/sbom","scorecard":{"id":833569,"data":{"date":"2025-08-11","repo":{"name":"github.com/smithoss/gonymizer","commit":"b4e62e9f66596d93e196e2b4cd0a3a4d309ca792"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":5.3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":9,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Info: jobLevel 'actions' permission set to 'read': .github/workflows/codeql.yml:28","Info: jobLevel 'contents' permission set to 'read': .github/workflows/codeql.yml:29","Warn: no topLevel permission defined: .github/workflows/codeql.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":10,"reason":"all changesets reviewed","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql.yml:43: update your workflow using https://app.stepsecurity.io/secureworkflow/smithoss/gonymizer/codeql.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql.yml:47: update your workflow using https://app.stepsecurity.io/secureworkflow/smithoss/gonymizer/codeql.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql.yml:61: update your workflow using https://app.stepsecurity.io/secureworkflow/smithoss/gonymizer/codeql.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql.yml:74: update your workflow using https://app.stepsecurity.io/secureworkflow/smithoss/gonymizer/codeql.yml/master?enable=pin","Warn: containerImage not pinned by hash: Dockerfile:4","Warn: containerImage not pinned by hash: Dockerfile:17","Info:   0 out of   4 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   2 containerImage dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":6,"reason":"4 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GO-2022-0635","Warn: Project is vulnerable to: GO-2022-0646","Warn: Project is vulnerable to: GO-2025-3787 / GHSA-fv92-fjc5-jj9h","Warn: Project is vulnerable to: GO-2025-3487 / GHSA-hcg3-q754-cr77"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":8,"reason":"SAST tool detected but not run on all commits","details":["Info: SAST configuration detected: CodeQL","Warn: 14 commits out of 30 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-23T18:23:17.637Z","repository_id":35002954,"created_at":"2025-08-23T18:23:17.638Z","updated_at":"2025-08-23T18:23:17.638Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28427183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T16:38:47.836Z","status":"ssl_error","status_checked_at":"2026-01-14T16:34:59.695Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymized-database","anonymizer","database-administrators","golang","hipaa","postgresql"],"created_at":"2024-08-28T18:01:07.748Z","updated_at":"2026-01-14T17:03:13.789Z","avatar_url":"https://github.com/smithoss.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# Gonymizer\n![GonymizerLogo.png](https://github.com/smithoss/gonymizer/blob/master/docs/images/gonymize_small.png?raw=true)\n\n\n\n\n-----\n\n[![CircleCI](https://circleci.com/gh/smithoss/gonymizer.svg?style=svg)](https://circleci.com/gh/smithoss/gonymizer)[![Slack](https://slackin.junkert.now.sh/badge.svg)](https://slackin.junkert.now.sh)[![Coverage Status](https://coveralls.io/repos/github/smithoss/gonymizer/badge.svg?branch=master)](https://coveralls.io/github/smithoss/gonymizer?branch=master)[![Go Report Card](https://goreportcard.com/badge/github.com/smithoss/gonymizer)](https://goreportcard.com/report/github.com/smithoss/gonymizer)[![GoDoc](https://godoc.org/github.com/smithoss/gonymizer?status.svg)](https://godoc.org/github.com/smithoss/gonymizer)\n\n- [Gonymizer](#gonymizer)\n  - [Weird name, what does it do?](#weird-name-what-does-it-do)\n  - [Supported RDBMS](#supported-rdbms)\n  - [Abbreviations and Definitions](#abbreviations-and-definitions)\n  - [Getting Started](#getting-started)\n    - [OSX](#osx)\n    - [Debian 9.x / Ubuntu 18.04](#debian-9x--ubuntu-1804)\n  - [Configuration](#configuration)\n    - [CLI Configuration](#cli-configuration)\n    - [Map File Configuration](#map-file-configuration)\n      - [Available Fakers and Scramblers](#available-fakers-and-scramblers)\n      - [Inclusive Map Files](#inclusive-map-files)\n      - [Exclusive Map Files](#exclusive-map-files)\n      - [Relationship Mapping](#relationship-mapping)\n      - [Grouping and Schema Prefix Matching (sharding)](#grouping-and-schema-prefix-matching-sharding)\n  - [Running Gonymizer](#running-gonymizer)\n    - [TL;DR Steps to anonymization (that's a word right?)](#tldr-steps-to-anonymization-thats-a-word-right)\n    - [Detailed Steps](#detailed-steps)\n  - [Creating Tests](#creating-tests)\n    - [Test Example](#test-example)\n  - [Notices and License](#notices-and-license)\n    - [Go Logo and Graphics](#go-logo-and-graphics)\n\n## Weird name, what does it do?\nThe Gonymizer project (Go + Anonymizer) is a project that was built at [SmithRx](https://www.smithrx.com) in hope to simplify the QA process. Gonymizer is\nwritten in Golang and is meant to help database administrators and infrastructure folks easily anonymize production\ndatabase dumps before loading this data into a QA environment.\n\nWe have built in support, and examples, for:\n* Kubernetes CRONJOB scheduling\n* AWS-S3 Storage processing and loading\n\nWe plan to have built-in:\n* CRONJOB BASH scripts to use local disk as storage (see tasks, we need help!)\n* AWS-Lambda Job scheduling (see tasks, we need help!)\n\nOur API is an easy one to follow and we encourage others to join in by trying Gonymizer with their own\ndevelopment and staging environments either directly using the CLI or using the API. We include in our\ndocumentation: example configurations, best practices, Kubernetes CRONJOB examples, examples for AWS-Lambda, and other\ninfrastructure tools. Please see the docs directory in this application to see a full how-to guide and where\nto get started.\n\n## Supported RDBMS\n\nCurrently Gonymizer only supports **PostgreSQL 9.x-13.x**. We have not tested Gonymizer on versions 12+,\nbut plan to in the near future. If you would like to help by adding support for other database management systems, new\nprocessors, or general questions please join by checking the CONTRIBUTING.md file in this repository.\n\n## Abbreviations and Definitions\n\n- **HIPAA**: Health Insurance Portability and Accountability Act of 1996\n- **PCI DSS**: Payment Card Industry Data Security Standard\n- **PHI**: Protected Health Information\n- **PII**: Personally identifiable information\n\nIn this document/codebase, we use them interchangeably.\n\n\n## Getting Started\nIf you are a seasoned Go veteran or already have an environment which contains Go\u003e= 1.11 then you can skip to\nthe next section.\n\n\n### OSX\n\nGonymizer requires that one has complete install of Go \u003e= 1.11. To install Go on OSX you can run the following:\n```\nbrew install go\n```\n\nOnce this is complete we will need to make sure our Go paths are set correctly in our BASH profile. **NOTE**: You may\nneed to change the directories below to match your setup.\n```\necho \"\nexport GOPATH=~/go\nexport GOROOT=/usr/local/Cellar/go/1.11.2/libexec\nexport GO111MODULE=on\n\" \u003e\u003e ~/.profile\n```\n\nIt is recommended to put all Go source code under ~/go. Once this is complete we can attempt to build the application:\n\n```\ncd ~/go/src/github.com/smithoss/gonymizer/scripts\n./build.sh\n```\n\nThe build script will build two binaries. One for MacOS on the amd64 architecture as well as a Linux amd64 binary. These\nbinaries are stored under the Gonymizer/bin directory. Now that we have a built binary we can attempt to download a\nmap file using our JSON configuration:\n\n```\n./gonymizer-darwin -c ~/conf/gonymizer-config-file.json dump\n```\n\n### Debian 9.x / Ubuntu 18.04\nUse the following steps to get up and going. Commands should be similar for Debian 9.x and Ubuntu 18.04.\n1. Install Golang and Git\n```\nsudo apt-get install go git\n```\n\n2. Add go path to profile\n```\necho \"\nexport GOPATH=~/go\nexport GO111MODULE=on\n\" \u003e\u003e ~/.bashrc\n\n```\n\n3. Git checkout\n```\nmkdir -p ~/go/src/github.com/smithoss/\ncd ~/go/src/github.com/smithoss/\ngit clone https://github.com/smithoss/Gonymizer.git gonymizer\n```\n\n4. Build the project\n```\ncd gonymizer/cmd/\ngo build -o ../bin/gonymizer .\n```\n\n5. Run the binary\n```\ncd ../bin\n./gonymizer --help\n```\n\n## Configuration\n\nGonymizer has many different configuration settings that can be enabled or disabled using the command line options.\nIt is recommended that one run `gonymizer --help` or `gonymizer CMD --help` where CMD is one of the commands to see\n which options are available at any given time.\n\nBelow we give examples of both the CLI configuration as well as examples on how to create your map file.\n\n### CLI Configuration\nGonymizer was built using the Cobra + Viper Golang libraries to allow for easy configuration however you like it. We\nrecommend using a JSON, YAML, or TOML file to configure Gonymizer. Below we will go over an example configuration for\nrunning Gonymizer.\n\nFor an example of how to set up a CLI configuration check our Dell Store 2 example in\ndocs/demo/dellstore2/gonymizer_config.json\n\n```\n{\n    \"comment\": \"This example is viewable under docs/demo/dellstore2\",\n    \"num-workers\": 2,\n    \"dump\":     {\n        \"database\":             \"store\",\n        \"disable-ssl\":          true,\n        \"dump-file\":            \"phi_dump.sql\",\n        \"exclude-schema\":      [\n            \"pg*\",\n            \"information_schema\"\n        ],\n        \"host\":                 \"localhost\",\n        \"port\":                 5432,\n        \"schema\":               [\"public\"],\n        \"row-count-file\":       \"row-counts.csv\",\n        \"username\":             \"levi\"\n    }\n  }\n}\n```\n\n`comment`: is used to leave for comments for the reader and is not used by the application.\n\n`log-level`: is the level the application uses to know what should be displayed to the screen. Choices are: FATAL,\nERROR, WARN, INFO, DEBUG. We use the Logrus Golang library for logging so please read the documentation\n[here](https://github.com/sirupsen/logrus) for more information.\n\n`database`: is the master database with PHI and PII that will be used for dumping a SQL dump file from.\n\n`host`: is the hostname for the master database with PHI and PII that will be used for dumping a SQL dump file from.\n\n`port`: is the host port that will be used to connect to the master database with PHI and PII.\n\n`username`: is the username that will be used to connect to the master database with PHI and PII.\n\n`password`: is the password that will be used to connect to the master database with PHI and PII.\n\n`disable-ssl`: is the master database with PHI and PII that will be used for dumping a SQL dump file from.\n\n`dump-file`: is where Gonymizer will store the SQL statements from the `dump` command.\n\n`map-file`: is the file that gonymizer uses to map out which columns need to be anonymized and how. When using the\n`map` command in conjunction with `--map-file`, or in the configuration above, a file is named similarly to the\n`map-file`, but with `skeleton` in the name instead. More on this below in the map section.\n\n`exclude-table`: is list of tables that are not to be included during the pg_dump step of the extraction process.\nThis allows us to only focus on tables that are needed for our base application to work. Using this option minimizes\nthe size of our dump file and in return decreases the amount of time needed for dumping, processing, and\nreloading. This option operates in the same fashion as pg_dump's `--exclude-table` option.\n\n`exclude-table-data`: allows you to create a list of tables we would like to include in the pg_dump process but do not\nwant to include any of the data (table schema only). The usage and advantages are the same as the `exclude-table`\nfeature explained above and is identical to pg_dump's `--exclude-table-data` option.\n\n`schema`: is a list of schemas the Gonymizer should dump from the master database. This option must be in the form\nof a list if you are using the configuration methods mentioned above.\n\n`exclude-schema`: is a list of system level schemas that Gonymizer should ignore when adding CREATE SCHEMA statements\nto the dump file. These schemas may still be included in the `--schema` option, for example the `public` schema.\n\n`schema-prefix`: is the prefix used for a schema environment where there is a prefix that matches other schemas. This\nis same as a sharded architecture design which is outside the scope of this article and it is recommended to read\n[here](https://en.wikipedia.org/wiki/Shard_(database_architecture)) if you are unfamiliar with this design paradigm.\nFor example: *[company_1, company2, company_..., company_n-1, company_n]* would be\n`--schema-prefix=company_ --schemas=company`\n\n`--oids`: allows you to provide the `--oids` option for older versions of pg_dump (prior to version 12)\n\n*NOTE:* Some arguments are not included here. It is recommended to use `gonymizer --help` and\n`gonymizer [COMMAND] --help` for more information and configuration options.\n\n### Map File Configuration\nOnce one has created a skeleton map file it is recommended to create a new *true* map file which will be used to let\ngonymizer know which columns need to be anonymized in the database and which columns do not. There are two methods in\nwhich gonymizer map files work (inclusive and exclusive).\n\n**NOTE:** Currently SmithRx is using an *exclusive dump file* which can be found under `map_files/prod_map.json`\n\n#### Available Fakers and Scramblers\nBelow is a list of fake data creators and scramblers. This table may not be up to date so please make sure to check\n`processor.go` for a full list.\n\n| Processor Name | Use |\n| -------------- |:----|\n| AlphaNumericScrambler | Scrambles strings. If a number is in the string it will replace it with another random number\n| EmptyJson | Replaces a JSON with an empty one (`{}`)\n| FakeStreetAddress | Used to replace a real US address with a fake one\n| FakeCity | Used to replace a city column\n| FakeLatitude | Used to replace a latitude column\n| FakeLongitude | Used to replace a longitude column\n| FakeCompanyName | Used to replace a company name\n| FakeParagraph | Used to generate a random paragraph\n| FakeUserAgent | Used to replace user agent with fake one\n| FakeEmailAddress | Used to replace e-mail with a fake one\n| FakeGender | Used to replace gender with a fake one\n| FakeFirstName | Used to replace a person's first name with a fake first name (non-gender specific)\n| FakeIPv4 | Used to replace an IPv4 with a fake one\n| FakeIPv6 | Used to replace IPv6 with a fake one\n| FakeCurrency | Used to replace currency with a fake one\n| FakeLastName | Used to replace a person's last name with a fake last name\n| ProcessorFullName | Used to replace a person's full name with fake one\n| ProcessorLanguage | Used to replace a person's language with fake one\n| FakePhoneNumber | Used to replace a person's phone number with fake phone number\n| FakeState | Used to replace a state (full state name, non-abbreviated)\n| FakeStateAbbrev | Used to replace a state abbreviation\n| FakeUsername | Used to replace a username with a fake one\n| FakeZip | Used to replace a real zip code with another zip code\n| Identity | Used to notify Gonymizer **not** to anonymize the column (same as leaving the column out of the map file)\n| RandomBoolean | Randomizes boolean fields\n| RandomDate | Randomizes Day and Month, but keeps year the same (HIPAA only requires month and day be changed)\n| RandomDigits | Randomizes a string of digit(s), but keeps the same length\n| RandomUUID | Randomizes a UUID string, but keep a mapping of the old UUID and map it to the new UUID. If the old is found elsewhere in the database the new UUID will be used instead of creating another one. Useful for UUID primary key mapping (relationships).\n| ScrubString | Replaces a string with \\*'s. Useful for password hashes.\n| UniqueAlphaNumericScrambler | Similar to AlphaNumericScrambler but that all scrambled strings in the table column will be unique.\n\n#### Inclusive Map Files\nAn *inclusive* map file is a map file which includes every column in every table that is contained in a list of schemas\nthat is configurable by using the `--schemas` option. If you are using a sharded/group configuration only one copy of\nthe column will be added to the file. An example map file can be found in `map_files/example_db_map.json`.\n\nOnce there is an up to date skeleton file one can then walk through the file and modify the \"Processors\".\"Name\" field\nfor any column that needs to be anonymized. This can be done by simply replacing the \"Identity\" processor with one\nlisted in the table above. For example to pick a fake first name for a column labeled `first_name` one would add the\n`FakeFirstName` to the \"Processors\".\"Name\" field like so:\n\n```\n{\n    \"TableSchema\": \"public\",\n    \"TableName\": \"users\",\n    \"ColumnName\": \"first_name\",\n    \"DataType\": \"character varying\",\n    \"ParentSchema\": \"\",\n    \"ParentTable\": \"\",\n    \"ParentColumn\": \"\",\n    \"OrdinalPosition\": 6,\n    \"IsNullable\": false,\n    \"Processors\": [\n        {\n            \"Name\": \"FakeFirstName\",\n            \"Max\": 0,\n            \"Min\": 0,\n            \"Variance\": 0,\n            \"Comment\": \"\"\n        }\n    ],\n    \"Comment\": \"\"\n}\n```\n\n#### Exclusive Map Files\nAn **exclusive** map file is a map file that contains only the columns that need to be anonymized. This is the only\ndifference from the **inclusive** map file method and should make map files smaller and simpler to navigate since they\nwill not contain any columns using the \"Identity\" processor. **It is assumed that all columns that are not listed in\nthe map file are considered to be OK to add to the dump file WITHOUT any scrambling or anonymization.** This means that\nthe user must add column definitions for every schema change that requires anonymization.\n\n**Pro Tip:** An east way to handle schema changes is to run the `map` command to create a new map file and copy/paste\nthe new columns into your map file while adding the proper processors at the same time.\n\n#### Relationship Mapping\nRelationship mapping allows the user to define columns that should remain congruent during the processing/anonymization\nstep. For example if a user is identified by a unique UUID that is used across multiple tables in the database one may\nselect the `RandomUUID` processor which keeps a global hash map of `OLD-UUID =\u003e NEW-UUID`. The\nglobal hash map then can be used by the processor and can also be stored to disk for back-tracing values to\ndebug the application. The only way to enable this type of logging is to edit the generator.go file and add the\nfunction call the *writeDebugMap* function. Adding this to your run-time is outside of the scope of this documentation\nand it is recommended to **NEVER** use this option when working with real PHI and PII data. If this file is compromised\nand stolen, an attacker will gain full access of the mapping of `(PHI, PII) =\u003e (Non-PHI, Non-PII)`.\n\nCurrently we only allow for global mapping of the following processors (more may be added later):\n* AlphaNumericScrambler\n* UniqueAlphaNumericScrambler\n* RandomUUID\n\nThey can be found in the processor.go file:\n```\nvar UUIDMap = map[uuid.UUID]uuid.UUID{}\nvar AlphaNumericMap = map[string]map[string]string{}\n```\n\nThere are plans to add more globally aware processors in the future, but at this time only 2 are available.\n\nTo map a relationship one can do this quite easily by notifying Gonymizer that there is a parent table and column that\nexist that the column should be mapped to. Below is an example where we identify the parent schema, table, and column:\n\n```\n{\n    \"TableSchema\": \"public\",\n    \"TableName\": \"credit_scores\",\n    \"ColumnName\": \"ssn\",\n    \"DataType\": \"integer\",\n    \"ParentSchema\": \"public\",\n    \"ParentTable\": \"user\",\n    \"ParentColumn\": \"ssn\",\n    \"OrdinalPosition\": 6,\n    \"IsNullable\": false,\n    \"Processors\": [\n        {\n            \"Name\": \"AlphaNumericScrambler\",\n            \"Max\": 0,\n            \"Min\": 0,\n            \"Variance\": 0,\n            \"Comment\": \"\"\n        }\n    ]\n    \"Comment\": \"\"\n}\n```\n\nIn the example above we are mapping the social security number (SSN) from the `credit_scores` table to the `users`\ntable by simply notifying gonymizer that there exists a map for ssn that is tied to the `users.ssn` table and column.\nGonymizer will see this and look the value up in the global **AlphaNumericMap** variable mentioned earlier. If the\noriginal SSN key does not exist in the map the Gonymizer will automatically scramble the SSN and add an entry in the\n map such that:\n\n `map[\"old SSN\"]: \"new value (new SSN)\"`\n\nEvery time gonymizer checks a value in the SSN column it will look up this value and replace it with the previously\nanonymized SSN. This allows us to map keys between tables.\n\nAlso make sure to add the parent table itself as a parent when creating a relationship mapping. From the example\nabove the same would be true:\n\n```\n{\n    \"TableSchema\": \"public\",\n    \"TableName\": \"user\",\n    \"ColumnName\": \"ssn\",\n    \"DataType\": \"integer\",\n    \"ParentSchema\": \"public\",\n    \"ParentTable\": \"user\",\n    \"ParentColumn\": \"ssn\",\n    \"OrdinalPosition\": 6,\n    \"IsNullable\": false,\n    \"Processors\": [\n        {\n            \"Name\": \"AlphaNumericScrambler\",\n            \"Max\": 0,\n            \"Min\": 0,\n            \"Variance\": 0,\n            \"Comment\": \"\"\n        }\n    ]\n    \"Comment\": \"\"\n}\n```\nNotice that we added the column as a parent of itself. If this step is missing all other columns will be mapped to the\ncorrect value, but the parent column will not be mapped to the same hash map so it will contain different values\nthan expected.\n\n**Note 1:** Multiple tables can link back to the user table by simply adding the schema, table, and column names to the\nparent fields in the map file for the specified column.\n\n\n#### Grouping and Schema Prefix Matching (sharding)\nSharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily\nmanaged parts called data shards. The word shard means a small part of a whole. Explanation is outside the scope of\nthis READ.me and more information can be found at this\n[Wikipedia article](https://en.wikipedia.org/wiki/Shard_(database_architecture\\)).\n\n**NOTE:** When working with a database that contains many schemas matching the schema-prefix (shards), one will need to\nmake sure that all tables and columns are **identical** across each schema. Manging the DDL for each schema is outside\nthe scope of Gonymizer project and should be done by external database administration tools.\n\n## Running Gonymizer\n\n### TL;DR Steps to anonymization (that's a word right?)\n\n1. Create a map file: `gonymizer -c config/production-conf.json map`\n2. Edit dump file to define which columns need to be anonymized.\n2. Create a PII encumbered dump file: `gonymizer -c config/prod-conf.json dump`\n3. Use the Process command to anonymize the PII dump file: `gonymizer -c config/prod-conf.json process`\n4. Use the Load command to load the anonymized database file into the database `gonymizer -c config/staging.json load`\n\nAlso check out our slides from [Percona Live 2019](https://www.percona.com/live/19/) [here](https://github.com/smithoss/gonymizer/tree/master/docs/conferences/PerconaLive2019)\n\n\n### Detailed Steps\n\n- Step 1: Generate a Map Skeleton (should only need to use the first time or during schema changes)\n\n    This will generate a new skeleton (defined, but empty) config file from scratch:\n\n        ./gonymizer -c config/prod-conf.json map\n\n    If you already have a map file and just need to due to migrations, schema changes, etc (2nd -\u003e nth runs) change\n    the path to the real map file. The map command will NOT overwrite your map file, instead it will create a new\n    file with _\"skeleton\"_ in the name. This will also append new columns to the bottom:\n\n        ./gonymizer -c config/prod-conf.json --map-file=db_mapper.prod_map.json map\n\n    Will output a file named:\n\n        db_mapper.prod_map.json.skeleton.json\n\n\n- Step 2: Copy the newly created skeleton file to a new production map file\n\n    **Pro Tip:** It is recommended to leave OUT column definitions from your map file that are to be skipped by the\n    gonymizer. This is to keep the map file simple and clean. The gonymizer will skip any column that is not in the\n    map file and continue on. The purpose of the skeleton file is to use it as a base line and to copy/paste your\n    anonymized columns from the skeleton file into your true map file. This map file will be used in the processing\n    step later. See Map Configuration above for more information.\n\n        mv db_mapper.prod_map.json.skeleton.json db_mapper.prod_map.json\n\n    Edit every field (removing unneeded columns if going Pro Tip route).  Add processors or Min/Max as necessary.\n\n- Step 3: Generate PHI \u0026 PII-encumbered dumpfile\n\n    **CAUTION!!** This dump file will contain PII!  Only do this on secure machines with encrypted block devices only!\n\n        ./gonymizer -c config/prod-config.json dump --dump-file=dump-pii.sql\n\n- Step 4: Generate altered data using the dumpfile built in step 3\n\n    If you've correctly configured db_mapper.j\n\n        ./gonymizer -c config/prod-conf.json --map-file=db_mapper.prod_nap.json\\\n         --dump-file=dump-pii.sql --s3-file-path=s3://my-bucket-name.s3.us-west-2.amazonaws.com/db-dump-processed.sql process\n\n- Step 5. Use the Load command to load the data into the database to verify that the data is correctly scrambled\n\n    The processed SQL file can simply be imported using PSQL.\n\n        ./gonymizer -c config/staging-conf.json --load-file=s3://my-bucket-name.s3.us-west-2.amazonaws.com/db-dump-processed.sql load\n\n\n## Creating Tests\nTesting for Gonymizer is different than expected for typical projects. When adding a test to the project one will\nneed to make sure the test is called from the `main_test.go` test harness file in the root directory of the project.\n\nAll tests should be added to the `seqUnitTests` function in the proper position in the test sequence. This sequence\ncreates, imports, modifies, and drops the database in the local test database.\n\nTo run tests you will want to use the command (in the root directory of the project)\n\n```\ngo test -v -run TestStart\n```\n\nTo specify username /  password you can use the following environment variables:\n```\nPGUSER\nPGDATABASE\n```\n\n### Test Example\nLets assume we created a new processor function for anonymizing IP addresses as seen in #64. In this case we create the\ntest using normal methods, but will need to add the function to the `main_test.go` by adding the following line:\n\n```\nt.Run(\"ProcessorIPV4\", TestProcessorIPv4)\n```\n\n## Notices and License\n\nPlease make sure to read our license agreement here [LICENSE.txt](https://github.com/smithoss/gonymizer/blob/master/LICENSE.txt). We may state throughout our documentation that we are using this\napplication to anonymize data for HIPAA requirements, but this is in our own environment and we give NO guarantee this\nwill be the same for other's uses. Considering everyone's data set is completely different and the configuration of\nthis application is very involved we cannot guarantee that this application will guarantee any compliance of any type.\nThis is the application users responsibility to verify with council that the dataset that is processed by the\napplication is indeed HIPAA/PCI/PHI/PII compliant.\n\n**THERE IS ABSOLUTELY NO GUARANTEE THAT USING THIS SOFTWARE WILL COMPLETE A CORRECT ANONYMIZATION OF YOUR DATA SET\nFOR COMPLIANCE PURPOSES. PLEASE SEE LICENSE.txt FOR MORE INFORMATION.**\n\n### Go Logo and Graphics\n\nAll graphics used in this project are released under the [Create Common License 3.0](https://creativecommons.org/licenses/by/3.0/us/)\n\nThe Gonymizer Gophers logo was created by [Levi Junkert](https://github.com/junkert) which uses the Go Gopher that [Takuya Ueda](https://twitter.com/tenntenn) made from the original design of the Go Gopher which was created by [Renee French](http://reneefrench.blogspot.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmithoss%2Fgonymizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmithoss%2Fgonymizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmithoss%2Fgonymizer/lists"}