{"id":13463376,"url":"https://github.com/sunitparekh/data-anonymization","last_synced_at":"2025-03-25T06:31:59.586Z","repository":{"id":41162000,"uuid":"5254362","full_name":"sunitparekh/data-anonymization","owner":"sunitparekh","description":"Want to use production data for testing, data-anonymization can help you.","archived":false,"fork":false,"pushed_at":"2024-03-20T11:29:53.000Z","size":1016,"stargazers_count":454,"open_issues_count":33,"forks_count":94,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-08-31T13:18:09.101Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sunitparekh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-08-01T03:03:11.000Z","updated_at":"2024-08-27T02:13:45.000Z","dependencies_parsed_at":"2024-01-13T17:55:25.239Z","dependency_job_id":"24b11b76-f7ba-4e93-8eb3-766fa79921f5","html_url":"https://github.com/sunitparekh/data-anonymization","commit_stats":{"total_commits":357,"total_committers":22,"mean_commits":"16.227272727272727","dds":"0.30532212885154064","last_synced_commit":"84a6683cecc33bfc6bf0fb91c5ded3a71e277976"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunitparekh%2Fdata-anonymization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunitparekh%2Fdata-anonymization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunitparekh%2Fdata-anonymization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunitparekh%2Fdata-anonymization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sunitparekh","download_url":"https://codeload.github.com/sunitparekh/data-anonymization/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222045608,"owners_count":16921981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T13:00:52.311Z","updated_at":"2025-03-25T06:31:59.576Z","avatar_url":"https://github.com/sunitparekh.png","language":"Ruby","readme":"# Data::Anonymization\nAfraid of using production data due to privacy issues? Data Anonymization is a tool that helps you build anonymized production data dumps which you can use for performance testing, security testing, debugging and development.\n\n## Java/Kotlin version\n\nJava/Kotlin version of tool supporting RDBMS databases is available with similar easy to use DSL. \n* [Kotlin/Java Data Anonymization Tool](https://github.com/dataanon/data-anon)\n* [Kotlin Maven Sample Project](https://github.com/dataanon/dataanon-kotlin-sample)\n* [Java Maven Sample Project](https://github.com/dataanon/dataanon-java-sample)\n\n\n----------------------\n\n\n[\u003cimg src=\"https://secure.travis-ci.org/sunitparekh/data-anonymization.png?branch=master\"\u003e](http://travis-ci.org/sunitparekh/data-anonymization)\n[\u003cimg src=\"https://gemnasium.com/sunitparekh/data-anonymization.png?travis\"\u003e](https://gemnasium.com/sunitparekh/data-anonymization)\n[\u003cimg src=\"https://codeclimate.com/badge.png\"\u003e](https://codeclimate.com/github/sunitparekh/data-anonymization)\n[![Coverage Status](https://coveralls.io/repos/sunitparekh/data-anonymization/badge.png?branch=master)](https://coveralls.io/r/sunitparekh/data-anonymization?branch=master)\n[![Gem Version](https://badge.fury.io/rb/data-anonymization.svg)](http://badge.fury.io/rb/data-anonymization)\n\n## Getting started\n\nInstall gem using:\n\n    $ gem install data-anonymization\n\nInstall required database adapter library for active record:\n\n    $ gem install sqlite3\n\nCreate ruby program using data-anonymization DSL as following `my_dsl.rb`:\n\n```ruby\nrequire 'data-anonymization'\n\ndatabase 'DatabaseName' do\n  strategy DataAnon::Strategy::Blacklist  # whitelist (default) or blacklist\n\n  # database config as active record connection hash\n  source_db :adapter =\u003e 'sqlite3', :database =\u003e 'sample-data/chinook-empty.sqlite'\n\n  # User -\u003e table name (case sensitive)\n  table 'User' do\n    # id, DateOfBirth, FirstName, LastName, UserName, Password -\u003e table column names (case sensitive)\n    primary_key 'id' # composite key is also supported\n    anonymize 'DateOfBirth','FirstName','LastName' # uses default anonymization based on data types\n    anonymize('UserName').using FieldStrategy::StringTemplate.new('user#{row_number}')\n    anonymize('Password') { |field| \"password\" }\n  end\n\n  ...\n\nend\n```\n\nRun using:\n\n    $ ruby my_dsl.rb\n\nLiked it? please share\n\n[\u003cimg src=\"https://si0.twimg.com/a/1346446870/images/resources/twitter-bird-light-bgs.png\" height=\"35\" width=\"35\"\u003e](https://twitter.com/share?text=A+simple+ruby+DSL+based+data+anonymization\u0026url=http:%2F%2Fsunitparekh.github.com%2Fdata-anonymization\u0026via=dataanon\u0026hashtags=dataanon)\n\n## Examples\n\nSQLite database\n\n1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/examples/whitelist_dsl.rb)\n2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/examples/blacklist_dsl.rb)\n\nMongoDB\n\n1. [Whitelist](https://github.com/sunitparekh/data-anonymization/blob/master/examples/mongodb_whitelist_dsl.rb)\n2. [Blacklist](https://github.com/sunitparekh/data-anonymization/blob/master/examples/mongodb_blacklist_dsl.rb)\n\nPostgresql database having **composite primary key**\n\n1. [Whitelist](https://github.com/sunitparekh/test-anonymization/blob/master/dell_whitelist.rb)\n2. [Blacklist](https://github.com/sunitparekh/test-anonymization/blob/master/dell_blacklist.rb)\n\n\n## Changelog\n\n#### 0.9.0 (Jan 29, 2025)\n1. Fixed issue with connection pool\n\n#### 0.9.0 (Dec 24, 2024)\n1. Upgraded to rails 8.0\n\n#### 0.8.10 (Mar 20, 2024)\n1. Upgraded to rails 7.1\n\n#### 0.8.7 (Jan 14, 2022)\n1. Upgraded to rails 7.x\n\n#### 0.8.5 (May 28, 2020)\n1. Upgraded to rails 6.x\n\n#### 0.8.1 (Aug 19, 2017)\n1. Multi-threading support added by [stanislav-tyutin](https://github.com/stanislav-tyutin) using Pull Request.\n2. Fixed to work with Ruby 2.4.x, issue with Integer data type\n\n#### 0.8.0 (Oct 31, 2016)\n1. Upgraded to rails 5.x\n\n#### 0.7.4 (Oct 29, 2016)\n1. Continue to work on rails 4.x. Minor changes based on feedback.\n\n#### 0.8.0.rc1 (Sep 5, 2016)\n1. Upgraded to rails 5.0, please report any issue or use case not working.\n\n#### 0.7.3 (Feb 5, 2016)\n1. Fixed issue with batchsize. Thanks to [Jan Raasch](https://github.com/janraasch) for sending pull request.\n\n#### 0.7.2 (Sep 26, 2015)\n1. Upgraded MongoDB to latest gem version 2.1.0 and tested with MongoDB 3.x version.\n2. Upgraded gems to latest version\n3. Adding limit functionality - Merge pull request #27 from yanismydj/master\n\n#### 0.7.1 (Jun 13, 2015)\n1. Fixed issues with empty array data for MongoDB\n2. Added feature to skip and continue records during anaonymisation, this is useful to apply different strategies for different types of records.\n\n\n#### 0.7.0 (Mar 9, 2015)\n1. Removed downcase from field name since it was causing issues with upper case field names. So now for databsae where case matters field name case should be maintained.\n2. Upgraded gems to latest version\n\n\n#### 0.6.7 (Jan 17, 2015)\n1. Upgraded gems to latest version including activerecord to 4.2. Please try it out and provide feedback.\n\n\n#### 0.6.6 (Oct 31, 2014)\n1. Upgraded gems to latest version.\n\n\n#### 0.6.5 (Jun 02, 2014)\n1. Upgraded most of the gems to latest version. major change is rails activerecord gem to latest versions 4.1.1, please provide feedback.\n\n#### 0.6.0 (Dec 09, 2013)\n1. Upgraded rails activerecord gem to latest versions 4.0.2, please provide feedback.\n\n#### 0.5.5 (Dec 4, 2013)\n1. Upgraded gems to latest versions\n\n\n#### 0.5.2  (Jan 29, 2013)\n\n1. Fixed [issue #17](https://github.com/sunitparekh/data-anonymization/issues/17)\n2. Upgraded Thor dependency to latest version\n\n\n#### 0.5.2  (Jan 20, 2013)\n\n1. Upgraded all gem to latest and greatest including Rails activerecord and activesupport.\n\n#### 0.5.1  (Oct 26, 2012)\n\n1. Minor fixes release, no major functionality or feature added.\n\nPlease see the [Github 0.5.1 milestone page](https://github.com/sunitparekh/data-anonymization/issues?milestone=3\u0026state=open) for more details on changes/fixes in release 0.5.1\n\n#### 0.5.0  (Sep 28, 2012)\n\nMajor changes:\n\n1. MongoDB support\n2. Command line utility to generate whitelist DSL for RDBMS \u0026 MongoDB (reduces pain for writing whitelist dsl)\n3. Added support for reporting fields missing mapping in case of whitelist\n4. Errors reported at the end of process. Job doesn't fail for a single error, it fails it more than 100 records failed during anonymization.\n\n\nPlease see the [Github 0.5.0 milestone page](https://github.com/sunitparekh/data-anonymization/issues?milestone=2\u0026state=open) for more details on changes/fixes in release 0.5.0\n\n#### 0.3.0 (Sep 4, 2012)\n\nMajor changes:\n\n1. Added support for Parallel table execution\n2. Change in default String strategy from LoremIpsum to RandomString based on end user feedback.\n3. Fixed issue with table column name 'type' as this is default name for STI in activerecord.\n\nPlease see the [Github 0.3.0 milestone page](https://github.com/sunitparekh/data-anonymization/issues?milestone=1\u0026state=closed) for more details on changes/fixes in release 0.3.0\n\n#### 0.2.0 (August 16, 2012)\n\n1. Added the progress bar using 'powerbar' gem. Which also shows the ETA for each table.\n2. Added More strategies\n3. Fixed default anonymization strategies for boolean and integer values\n4. Added support for composite primary key\n\n#### 0.1.2 (August 14, 2012)\n\n1. First initial release\n\n## Roadmap\n\nMVP done. Fix defects and support queries, suggestions, enhancements logged in Github issues :-)\n\n## Share feedback\n\nPlease use Github [issues](https://github.com/sunitparekh/data-anonymization/issues) to share feedback, feature suggestions and report issues.\n\n## What is data anonymization?\n\nFor almost all projects there is a need for production data dump in order to run performance tests, rehearse production releases and debug production issues.\nHowever, getting production data and using it is not feasible due to multiple reasons, primary being privacy concerns for user data. And thus the need for data anonymization.\nThis tool helps you to get anonymized production data dump using either Blacklist or Whitelist strategies.\n\nRead more about [data anonymization here](http://sunitspace.blogspot.in/2012/09/data-anonymization.html)\n\n## Anonymization Strategies\n\n### Blacklist\nThis approach essentially leaves all fields unchanged with the exception of those specified by the user, which are scrambled/anonymized (hence the name blacklist).\nFor `Blacklist` create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification. Most of the fields have different rules e.g. password should be set to same value for all users, email needs to be valid.\n\nThe problem with this approach is that when new fields are added they will not be anonymized by default. Human error in omitting users personal data could be damaging.\n\n```ruby\ndatabase 'DatabaseName' do\n  strategy DataAnon::Strategy::Blacklist\n  source_db :adapter =\u003e 'sqlite3', :database =\u003e 'sample-data/chinook-empty.sqlite'\n  ...\nend\n```\n\n### Whitelist\nThis approach, by default scrambles/anonymizes all fields except a list of fields which are allowed to copied as is. Hence the name whitelist.\nBy default all data needs to be anonymized. So from production database data is sanitized record by record and inserted as anonymized data into destination database. Source database needs to be readonly.\nAll fields would be anonymized using default anonymization strategy which is based on the datatype, unless a special anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.\nA whitelisted field implies that it's okay to copy the data as is and anonymization isn't required.\nThis way any new field will be anonymized by default and if we need them as is, add it to the whitelist explicitly. This prevents any human error and protects sensitive information.\n\n```ruby\ndatabase 'DatabaseName' do\n  strategy DataAnon::Strategy::Whitelist\n  source_db :adapter =\u003e 'sqlite3', :database =\u003e 'sample-data/chinook.sqlite'\n  destination_db :adapter =\u003e 'sqlite3', :database =\u003e 'sample-data/chinook-empty.sqlite'\n  ...\nend\n```\n\nRead more about [blacklist and whitelist here](http://sunitspace.blogspot.in/2012/09/data-anonymization-blacklist-whitelist.html)\n\n\n## Tips\n\n1. In Whitelist approach make source database connection READONLY.\n2. Change [default field strategies](#default-field-strategies) to avoid using same strategy again and again in your DSL.\n3. To run anonymization in parallel at Table level, provided no FK constraint on tables use DataAnon::Parallel::Table strategy\n4. For large table to load them in batches from table set 'batch_size' and it will use RoR's batch mode processing. Checkout [example](https://github.com/sunitparekh/data-anonymization/blob/master/examples/whitelist_dsl.rb) on how to use batch processing.\n5. Make sure to give proper case for fields and table names.\n6. Use skip and continue to apply different strategies for records.\n7. Use 'limit' to limit the number of rows that will be imported in whitelist\n8. RDBMS databases utilizing schemas can be specified via `schema_search_path`: `source_db { ... schema_search_path: 'public,my_special_schema' }`\n\n## DSL Generation\n\nWe provide a command line tool to generate whitelist scripts for RDBMS and NoSQL databases. The user needs to supply the connection details to the database and a script is generated by analyzing the schema. Below are examples of how to use the tool to generate the scripts for RDBMS and NoSQL datastores\n\nWhen you install the data-anonymization tool, the **datanon** command become available on the terminal. If you type **datanon --help** and execute you should see the below\n\n```\nTasks:\n\ndatanon generate_mongo_dsl -d, --database=DATABASE -h, --host=HOST                        # Generates a base anonymization script(whitelist strategy) for a Mongo DB using the database schema\ndatanon generate_rdbms_dsl -a, --adapter=ADAPTER -d, --database=DATABASE -h, --host=HOST  # Generates a base anonymization script(whitelist strategy) for a RDBMS database using the database schema\ndatanon help [TASK]                                                                       # Describe available tasks or one specific task\n\n```\n\n### RDBMS whitelist generation\n\nThe gem uses ActiveRecord(AR) abstraction to connect to relational databases. You can generate a whitelist script in seconds for any relational database supported by Active Record. To do so use the following command\n\n```\ndatanon generate_rdbms_dsl [options]\n\n```\n\nThe options available are :\n\n1. adapter(-a)  : The activerecord adapter to use to connect to the database (eg. mysql2, postgresql)\n2. host(-h)     : DB host name or IP address\n3. database(-d) : The name of the database to generate the whitelist script for\n4. username(-u) : Username for DB authentication\n5. password(-w) : Password for DB authentication\n6. port(-p)     : The port the database service is running on. Default port provided by AR will be used if \t\t\t\t   nothing is specififed.\n\nThe adapter, host and database options are mandatory. The others are optional.\n\nA few examples of the command is shown below\n\n```\ndatanon generate_rdbms_dsl -a mysql2 -h db.host.com -p 3306 -d production_db -u root -w password\n\ndatanon generate_rdbms_dsl -a postgresql -h 123.456.7.8 -d production_db\n\n```\n\nThe relevant db gems must be installed so that AR has the adapters required to establish the connection to the databases. The script generates a file named **rdbms_whitelist_generated.rb** in the same location as the project.\n\n### MongoDB whitelist generation\n\nSimilar to the the relational databases, a whitelist script for mongo db can be generated by analysing the database structure\n\n```\ndatanon generate_mongo_dsl [options]\n\n```\n\nThe options available are :\n\n1. host(-h)     : DB host name or IP address\n2. database(-d) : The name of the database to generate the whitelist script for\n3. username(-u) : Username for DB authentication\n4. password(-w) : Password for DB authentication\n5. port(-p)     : The port the database service is running on.\n6. whitelist patterns(-r): A regex expression which can be used to match records in the database to list as whitelisted fields in the generated script.\n\nThe host and database options are mandatory. The others are optional.\n\nA few examples of the command is shown below\n\n```\ndatanon generate_mongo_dsl -h db.host.com -d production_db -u root -w password\n\ndatanon generate_mongo_dsl -h 123.456.7.8 -d production_db\n\n```\n\nThe **mongo** gem is required in order to install the mongo db drivers. The script generates a file named **mongodb_whitelist_generated.rb** in the same location as the project.\n\n\n\n## Running in Parallel\nCurrently provides capability of running anonymization in parallel at table level provided no FK constraints on tables.\nIt uses [Parallel gem](https://github.com/grosser/parallel) provided by Michael Grosser.\nBy default it starts multiple parallel ruby processes processing table one by one.\n\n```ruby\ndatabase 'DellStore' do\n  strategy DataAnon::Strategy::Whitelist\n  execution_strategy DataAnon::Parallel::Table  # by default sequential table processing\n  ...\nend\n```\n\n\n## DataAnon::Core::Field\nThe object that gets passed along with the field strategies.\n\nhas following attribute accessor\n\n- `name` current field/column name\n- `value` current field/column value\n- `row_number` current row number\n- `ar_record` active record of the current row under processing\n\n## Field Strategies\n\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003eContent\u003c/th\u003e\n\u003cth align=\"left\"\u003eName\u003c/th\u003e\n\u003cth align=\"left\"\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/LoremIpsum\"\u003eLoremIpsum\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a random Lorep Ipsum String\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomString\"\u003eRandomString\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a random string of equal length\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/StringTemplate\"\u003eStringTemplate\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a string based on provided template\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca\u003eSelectFromList\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomly selects a string from a provided list\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/SelectFromFile\"\u003eSelectFromFile\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomly selects a string from a provided file\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/FormattedStringNumber\"\u003eFormattedStringNumber\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomize digits in a string while maintaining the format\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/SelectFromDatabase\"\u003eSelectFromDatabase\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eSelects randomly from the result of a query on a database\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eText\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomUrl\"\u003eRandomUrl\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eAnonymizes a URL while mainting the structure\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\u003ctable\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003eContent\u003c/th\u003e\n\u003cth align=\"left\"\u003eName\u003c/th\u003e\n\u003cth align=\"left\"\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eNumber\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomInteger\"\u003eRandomInteger\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a random integer between provided limits (default 0 to 100)\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eNumber\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomIntegerDelta\"\u003eRandomIntegerDelta\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a random integer within -delta and delta of original integer\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eNumber\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomFloat\"\u003eRandomFloat\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a random float between provided limits (default 0.0 to 100.0)\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eNumber\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca\u003eRandomFloatDelta\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a random float within -delta and delta of original float\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eNumber\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomBigDecimalDelta\"\u003eRandomBigDecimalDelta\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eSimilar to previous but creates a big decimal object\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\u003ctable\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003eContent\u003c/th\u003e\n\u003cth align=\"left\"\u003eName\u003c/th\u003e\n\u003cth align=\"left\"\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eAddress\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomAddress\"\u003eRandomAddress\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomly selects an address from a geojson flat file [Default US address]\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eCity\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomCity\"\u003eRandomCity\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eSimilar to address, picks a random city from a geojson flafile [Default US cities]\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eProvince\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomProvince\"\u003eRandomProvince\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eSimilar to address, picks a random city from a geojson flafile [Default US provinces]\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eZip code\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomZipcode\"\u003eRandomZipcode\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eSimilar to address, picks a random zipcode from a geojson flafile [Default US zipcodes]\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003ePhone number\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomPhoneNumber\"\u003eRandomPhoneNumber\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomizes a phone number while preserving locale specific fomatting\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\u003ctable\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003eContent\u003c/th\u003e\n\u003cth align=\"left\"\u003eName\u003c/th\u003e\n\u003cth align=\"left\"\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eDateTime\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/AnonymizeDateTime\"\u003eAnonymizeDateTime\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eAnonymizes each field (except year and seconds) within natural range of the field depending on true/false flag provided\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eTime\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/AnonymizeTime\"\u003eAnonymizeTime\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eExactly similar to above except returned object is of type 'Time'\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eDate\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/AnonymizeDate\"\u003eAnonymizeDate\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eAnonymizes day and month within natural ranges based on true/false flag\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eDateTimeDelta\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/DateTimeDelta\"\u003eDateTimeDelta\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eShifts data randomly within given range. Default shifts date within 10 days + or - and shifts time within 30 minutes.\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eTimeDelta\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/TimeDelta\"\u003eTimeDelta\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eExactly similar to above except returned object is of type 'Time'\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eDateDelta\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/DateDelta\"\u003eDateDelta\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eShifts date randomly within given delta range. Default shits date within 10 days + or -\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\u003ctable\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003eContent\u003c/th\u003e\n\u003cth align=\"left\"\u003eName\u003c/th\u003e\n\u003cth align=\"left\"\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eEmail\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomEmail\"\u003eRandomEmail\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates email randomly using the given HOSTNAME and TLD.\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eEmail\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/GmailTemplate\"\u003eGmailTemplate\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates a valid unique gmail address by taking advantage of the gmail + strategy\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eEmail\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomMailinatorEmail\"\u003eRandomMailinatorEmail\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates random email using mailinator hostname.\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\u003ctable\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003eContent\u003c/th\u003e\n\u003cth align=\"left\"\u003eName\u003c/th\u003e\n\u003cth align=\"left\"\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eFirst name\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomFirstName\"\u003eRandomFirstName\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomly picks up first name from the predefined list in the file. Default \u003ca href=\"https://raw.github.com/sunitparekh/data-anonymization/master/resources/first_names.txt\"\u003efile\u003c/a\u003e is part of the gem.\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eLast name\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomLastName\"\u003eRandomLastName\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eRandomly picks up last name from the predefined list in the file. Default \u003ca href=\"https://raw.github.com/sunitparekh/data-anonymization/master/resources/last_names.txt\"\u003efile\u003c/a\u003e is part of the gem.\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eFull Name\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomFullName\"\u003eRandomFullName\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates full name using the RandomFirstName and RandomLastName strategies.\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003eUser name\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ca href=\"http://rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/RandomUserName\"\u003eRandomUserName\u003c/a\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003eGenerates random user name of same length as original user name.\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n## Write you own field strategies\nfield parameter in following code is [DataAnon::Core::Field](#dataanon-core-field)\n\n```ruby\nclass MyFieldStrategy\n\n    # method anonymize is what required\n    def anonymize field\n      # write your code here\n    end\n\nend\n```\n\nwrite your own anonymous field strategies within DSL,\n\n```ruby\n  table 'User' do\n    anonymize('Password') { |field| \"password\" }\n    anonymize('email') do |field|\n        \"test+#{field.row_number}@gmail.com\"\n    end\n  end\n```\n\n## Default field strategies\n\n```ruby\nDEFAULT_STRATEGIES = {:string =\u003e FieldStrategy::RandomString.new,\n                      :fixnum =\u003e FieldStrategy::RandomIntegerDelta.new(5),\n                      :bignum =\u003e FieldStrategy::RandomIntegerDelta.new(5000),\n                      :float =\u003e FieldStrategy::RandomFloatDelta.new(5.0),\n                      :bigdecimal =\u003e FieldStrategy::RandomBigDecimalDelta.new(500.0),\n                      :datetime =\u003e FieldStrategy::DateTimeDelta.new,\n                      :time =\u003e FieldStrategy::TimeDelta.new,\n                      :date =\u003e FieldStrategy::DateDelta.new,\n                      :trueclass =\u003e FieldStrategy::RandomBoolean.new,\n                      :falseclass =\u003e FieldStrategy::RandomBoolean.new\n}\n```\n\nOverriding default field strategies \u0026 can be used to provide default strategy for missing data type.\n\n```ruby\ndatabase 'Chinook' do\n  ...\n  default_field_strategies  :string =\u003e FieldStrategy::RandomString.new\n  ...\nend\n```\n\n## Logging\n\nHow do I switch off the progress bar?\n\n```ruby\n# add following line in your ruby file\nENV['show_progress'] = 'false'\n```\n\n`Logger` provides debug level messages including database queries of active record.\n\n```ruby\nDataAnon::Utils::Logging.logger.level = Logger::INFO\n```\n\n## Skip and Continue records\n\n*Skip* is used to skip records during anonymization when condition returns true. This records are ignored,\nin blacklist it remains as it is in database and in case of whitelist this records will not be copied to destination database.\n\n```ruby\ntable 'customers' do\n  skip { |index, record| record['age'] \u003c 18 }\n\n  primary_key 'cust_id'\n  anonymize('email').using FieldStrategy::StringTemplate.new('test+#{row_number}@gmail.com')\n  anonymize 'terms_n_condition', 'age'\nend\n```\n\n\n*Continue* is exactly opposite of Skip and it continue with anonymization only if given condition returns true.\nIn case of blacklist records are anonymized for matching conditions and for whitelist records are anonymized and copied\nto new database for matching conditions.\n\n```ruby\ntable 'customers' do\n  continue { |index, record| record['age'] \u003e 18 }\n\n  primary_key 'cust_id'\n  anonymize('email').using FieldStrategy::StringTemplate.new('test+#{row_number}@gmail.com')\n  anonymize 'terms_n_condition', 'age'\nend\n```\n\n\n## Want to contribute?\n\n1. Fork it\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create new Pull Request\n\n## License\n\n[MIT License](https://github.com/sunitparekh/data-anonymization/blob/master/LICENSE.txt)\n\n## Credits\n\n- [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make it open source.\n- [Panda](https://twitter.com/sarbashrestha) for reviewing the documentation.\n- [Dan Abel](http://www.linkedin.com/pub/dan-abel/0/61b/9b0) for introducing me to Blacklist and Whitelist approach for data anonymization.\n- [Chirga Doshi](https://twitter.com/chiragsdoshi) for encouraging me to get this done.\n- [Aditya Karle](https://twitter.com/adityakarle) for the Logo. (Coming Soon...)\n","funding_links":[],"categories":["Testing","\u003ca id=\"9eee96404f868f372a6cbc6769ccb7f8\"\u003e\u003c/a\u003e新添加的","\u003ca id=\"9eee96404f868f372a6cbc6769ccb7f8\"\u003e\u003c/a\u003e工具","Ruby"],"sub_categories":["Random Data Generation","\u003ca id=\"31185b925d5152c7469b963809ceb22d\"\u003e\u003c/a\u003e新添加的"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunitparekh%2Fdata-anonymization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsunitparekh%2Fdata-anonymization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunitparekh%2Fdata-anonymization/lists"}