{"id":27556785,"url":"https://github.com/dataanon/data-anon","last_synced_at":"2025-08-02T22:04:55.126Z","repository":{"id":37270991,"uuid":"109380307","full_name":"dataanon/data-anon","owner":"dataanon","description":"Data Anonymization implementation in Kotiln","archived":false,"fork":false,"pushed_at":"2023-07-07T22:00:57.000Z","size":207,"stargazers_count":36,"open_issues_count":6,"forks_count":9,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-27T00:40:26.232Z","etag":null,"topics":["anonymization","blacklist","dsl","java","kotlin","kotlin-anonymization","whitelist"],"latest_commit_sha":null,"homepage":"https://dataanon.github.io/data-anon/","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dataanon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-11-03T09:59:41.000Z","updated_at":"2023-08-30T22:44:37.000Z","dependencies_parsed_at":"2025-04-19T20:09:09.421Z","dependency_job_id":"3b3a7540-3249-4177-a676-8bba45a77d6f","html_url":"https://github.com/dataanon/data-anon","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dataanon/data-anon","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataanon%2Fdata-anon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataanon%2Fdata-anon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataanon%2Fdata-anon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataanon%2Fdata-anon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dataanon","download_url":"https://codeload.github.com/dataanon/data-anon/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataanon%2Fdata-anon/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268464789,"owners_count":24254196,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymization","blacklist","dsl","java","kotlin","kotlin-anonymization","whitelist"],"created_at":"2025-04-19T19:57:10.469Z","updated_at":"2025-08-02T22:04:55.039Z","avatar_url":"https://github.com/dataanon.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data::Anonymization\n\nData Anonymization tool helps build anonymized production data dumps, \nwhich you can use for performance testing, security testing, debugging and development.\nTool is implemented in Kotlin, and works with Java 8 \u0026 Kotlin.\n\n[![Build Status](https://travis-ci.org/dataanon/data-anon.svg?branch=master)](https://travis-ci.org/dataanon/data-anon)\n\n## Getting started\n\n### Kotlin\n\n```kotlin\nfun main(args: Array\u003cString\u003e) {\n    // define your database connection settings \n    val source = DbConfig(\"jdbc:h2:tcp://localhost/~/movies_source\", \"sa\", \"\")\n    val dest = DbConfig(\"jdbc:h2:tcp://localhost/~/movies_dest\", \"sa\", \"\")\n\n    // choose Whitelist or Blacklist strategy for anonymization\n    Whitelist(source, dest)\n            .table(\"MOVIES\") {  // start with table                                \n                where(\"GENRE = 'Drama'\")    // allows to select only desired rows (optional)\n                limit(1_00_000)             // useful for testing (optional)\n\n                // pass through fields, leave it as is (do not anonymize)\n                whitelist(\"MOVIE_ID\")\n\n                // field by field decide the anonymization strategy\n                anonymize(\"GENRE\").using(PickFromDatabase\u003cString\u003e(source,\"SELECT DISTINCT GENRE FROM MOVIES\"))\n                anonymize(\"RELEASE_DATE\").using(DateRandomDelta(10))\n\n                // write your own in-line strategy\n                anonymize(\"TITLE\").using(object: AnonymizationStrategy\u003cString\u003e{\n                    override fun anonymize(field: Field\u003cString\u003e, record: Record): String = \"MY MOVIE ${record.rowNum}\"\n                })\n            }\n\n            // continue with multiple tables\n            .table(\"RATINGS\") {\n                whitelist(\"MOVIE_ID\",\"USER_ID\",\"CREATED_AT\")\n                anonymize(\"RATING\").using(FixedDouble(4.3))\n            }\n            .execute()\n}\n```\n\n### Java\n\n```java\npublic class Anonymizer {\n\n    public static void main(String[] args) {\n\n        // define your database connection settings\n        DbConfig source = new DbConfig(\"jdbc:h2:tcp://localhost/~/movies_source\", \"sa\", \"\");\n        DbConfig dest = new DbConfig(\"jdbc:h2:tcp://localhost/~/movies_dest\", \"sa\", \"\");\n\n        // choose Whitelist or Blacklist strategy for anonymization\n        new Whitelist(source, dest)\n\n            // start with table\n            .table(\"MOVIES\", table -\u003e {\n                table.where(\"GENRE = 'Drama'\");     // allows to select only desired rows (optional)\n                table.limit(1_00);                  // useful for testing (optional)\n\n                // pass through fields, leave it as is (do not anonymize)\n                table.whitelist(\"MOVIE_ID\");\n\n                // field by field decide the anonymization strategy\n                table.anonymize(\"GENRE\").using(new PickFromDatabase\u003cString\u003e(source,\"SELECT DISTINCT GENRE FROM MOVIES\"));\n                table.anonymize(\"RELEASE_DATE\").using(new DateRandomDelta(10));\n\n                // write your own in-line strategy\n                table.anonymize(\"TITLE\").using((AnonymizationStrategy\u003cString\u003e) (field, record) -\u003e \"MY MOVIE \" + record.getRowNum());\n\n                // return just to cover up for Kotlin, copy as is\n                return Unit.INSTANCE;\n            })\n\n            // continue with multiple tables\n            .table(\"RATINGS\", table -\u003e {\n                table.whitelist(\"MOVIE_ID\", \"USER_ID\", \"CREATED_AT\");\n                table.anonymize(\"RATING\").using(new FixedDouble(4.3));\n                return Unit.INSTANCE;\n            })\n            .execute();\n    }\n}\n```\n\n\n## Running\n\n```shell\n$ mvn compile exec:java\n```\n\nOr\n\n```shell\n$ mvn package\n$ java -jar target/data-anon.jar\n```\n\n## Start with samples...\n\n* [Kotlin](https://github.com/dataanon/dataanon-kotlin-sample)\n* [Java](https://github.com/dataanon/dataanon-java-sample) \n\n----------------------\n## Notes/tips on usage of tool\n\n1. In Whitelist approach provide source database connection user with READONLY access.\n1. All messages (incl. errors) are logged in `dataaonn.log` file. Using `logging.properties` control log messages.\n1. Use `where` and `limit` to limit the number of rows during anonymization. Very useful for testing purpose.\n1. Extend `DbConfig` and implement `connection` method for special handling while creating database connection.\n1. Write your [own anonymization strategy](#write-your-own-anonymization-strategy) for specific cases.\n1. `null` values are kept `null` after anonymization.\n1. Use `PickFromDatabase` strategy for picking values for existing values and `enum` type values.\n\n----------------------\n\n## Roadmap\n\n1. Support default strategy based on data type. As of now you need to specify anonymization strategy for each field.\n1. MongoDB database.\n\n## Share feedback\n\nPlease use Github [issues](https://github.com/dataanon/data-anon/issues) to share feedback, feature suggestions and report issues.\n\n## Changelog\n\n#### 0.9.3 (Mar 214, 2018)\n\n1. Error handling using logs. All messages logged in `dataaonn.log` file. Using `logging.properties` control log messages.\n1. Added date and timestamp related anonymization strategies - DateRandomDelta \u0026 DateTimeRandomDelta\n1. PickFromDatabase strategy added to support enum types fields picked up from database column. Useful for names type fields as well.\n\n#### 0.9.1 (Feb 26, 2018)\n\n1. First initial release with RDBMS support\n1. First class support for table parallelization providing better performance\n1. Easy to use DSL built using Kotlin\n\n----------------------\n\n## What is data anonymization?\n\nFor almost all projects there is a need for production data dump in order to run performance tests, rehearse production releases and debug production issues.\nHowever, getting production data and using it is not feasible due to multiple reasons, primary being privacy concerns for user data. And thus the need for data anonymization.\nThis tool helps you to get anonymized production data dump using either Blacklist or Whitelist strategies.\n\nRead more about [data anonymization here](http://sunitspace.blogspot.in/2012/09/data-anonymization.html)\n\n## Anonymization Strategies\n\n### Blacklist\nBlacklist approach essentially leaves all fields unchanged with the exception of those specified by the user, which are scrambled/anonymized.\nFor `Blacklist` create a copy of prod database and chooses the fields to be anonymized like e.g. username, password, email, name, geo location etc. based on user specification. Most of the fields have different rules e.g. password should be set to same value for all users, email needs to be valid.\n\nThe problem with this approach is that when new fields are added they will not be anonymized by default. Human error in omitting users personal data could be damaging.\n\n```kotlin\nBlacklist(database)\n    .table(\"RATINGS\") {\n        anonymize(\"RATING\").using(FixedDouble(4.3))\n    }\n.execute()\n```\n\n### Whitelist\nWhitelist approach, by default scrambles/anonymizes all fields except a list of fields which are allowed to copied as is.\nBy default all data needs to be anonymized. So from production database data is sanitized record by record and inserted as anonymized data into destination database. Source database needs to be readonly.\nAll fields would be anonymized using default anonymization strategy which is based on the datatype, unless a special anonymization strategy is specified. For instance special strategies could be used for emails, passwords, usernames etc.\nA whitelisted field implies that it's okay to copy the data as is and anonymization isn't required.\nThis way any new field will be anonymized by default and if we need them as is, add it to the whitelist explicitly. This prevents any human error and protects sensitive information.\n\n```kotlin\nWhitelist(source, dest)\n    .table(\"RATINGS\") {\n        whitelist(\"MOVIE_ID\",\"USER_ID\",\"CREATED_AT\")\n        anonymize(\"RATING\").using(FixedDouble(4.3))\n    }\n.execute()\n```\n\nRead more about [blacklist and whitelist here](http://sunitspace.blogspot.in/2012/09/data-anonymization-blacklist-whitelist.html)\n\n\n----------------------\n## Anonymization Strategies\n\n| DataType              | Stratergy                     | Description                                               |\n| :---                  | :---                          | :---                                                      |\n| Boolean               | RandomBooleanTrueFalse        | random selection of boolean true and false value          |\n| Boolean               | RandomBooleanOneZero          | random selection of 1 and 0 value representing boolean    |\n| Boolean               | RandomBooleanYN               | random selection of Y and N value representing boolean    |\n| String (Email)        | RandomEmail                   | generates emailId using one of random picked values defined in specified file (default file is first_names.dat) appended with row number with given host and tld |\n| String (First Name)   | RandomFirstName               | generates first name using one of random picked values from specified file. default file is (first_names.dat) |\n| String (Last Name)    | RandomLastName                | generates last name using one of random picked values from specified file. default file is (last_names.dat) |\n| Integer               | FixedInt                      | replace all records with the same specified fixed integer value (default to 100) |\n| Integer               | RandomInt                     | generate random integer value between specified range (default range from 0 to 100) |\n| Integer               | RandomIntDelta                | generate new integer value within random delta value (default is 10) on existing value |\n| Float                 | FixedFloat                    | replace all records with the same specified fixed float value (default to 100.0f) |\n| Float                 | RandomFloat                   | generate random float value between specified range (default range from 0.0f to 100.0f) |\n| Float                 | RandomFloatDelta              | generate new float value within random delta value (default is 10.0f) on existing value |\n| Double                | FixedDouble                   | replace all records with the same specified fixed double value (default to 100.0) |\n| Double                | RandomDouble                  | generate random double value between specified range (default range from 0.0 to 100.0) |\n| Double                | RandomDoubleDelta             | generate new double value within random delta value (default is 10.0) on existing value |\n| String                | FixedString                   | replace all records with the same specified fixed string value |\n| String                | LoremIpsum                    | replace with same length (size) Lorem Ipsum string |\n| String                | PickStringFromFile            | replace with randomly picked string (line) from file |\n| String                | RandomAlphabetic              | replace with random alphabets char set only creating string |\n| String                | RandomAlphaNumeric            | replace with random alphabets + numbers (alphanumeric char set) creating string |\n| String                | RandomFormattedString         | replace with string build with exactly same format (number replacing number, lowercase replacing lowercase alphabets \u0026 uppercase replacing uppercase alphabets |\n| String                | RandomString                  | replace with random generation of string of any char set |\n| String                | StringTemplate                | replace with string generated using template specified |\n| Date                  | DateRandomDelta               | date field is changed randomly within given range of days |\n| Timestamp             | DateTimeRandomDelta           | timestamp (datetime) field is changed randomly within given duration |\n| Any                   | PickFromList                  | replaces value with specified type and randomly picked from list of specified values |\n| Any                   | PickFromDatabase              | replaces value with specified type and randomly picked from list of specified values fetched from database |\n\n## Write your own Anonymization strategy\n\nImplement `AnonymizationStrategy` interface `override` method to write your own strategy.\n\n```kotlin\nclass RandomString: AnonymizationStrategy\u003cString\u003e {\n    override fun anonymize(field: Field\u003cString\u003e, record: Record): String = \"Record Number ${record.rowNum}\"\n}\n```\n\n`Field` class represents data related to the field getting processed for anonymization\n\n```kotlin\nclass Field\u003cT: Any\u003e(val name: String, val oldValue: T, var newValue: T = oldValue)\n```\n\n`Record` class represents the current record getting processed with row number and all the fields of the record.\nOther fields data is useful in case if there is any dependent field value which needs to be derived or calculated.\n\n```kotlin\nclass Record(private val fields: List\u003cField\u003cAny\u003e\u003e, val rowNum: Int) {\n    fun find(name: String): Field\u003cAny\u003e = fields.first {name.equals(it.name, true)}\n}\n```\n\nIt is very easy to write inline strategy as well. See examples\n\n* [Kotlin](https://github.com/dataanon/dataanon-kotlin-sample/blob/master/src/main/kotlin/com/github/dataanon/Anonymizer.kt#L20)\n* [Java](https://github.com/dataanon/dataanon-java-sample/blob/master/src/main/java/com/github/dataanon/Anonymizer.java#L20)\n\n\n## Want to contribute?\n\n1. Fork it\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create new Pull Request\n\n## Credits\n\n- [ThoughtWorks Inc](http://www.thoughtworks.com), for allowing us to build this tool and make it open source.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataanon%2Fdata-anon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdataanon%2Fdata-anon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataanon%2Fdata-anon/lists"}