{"id":21400334,"url":"https://github.com/oefenweb/python-untraceables","last_synced_at":"2026-02-03T20:36:58.164Z","repository":{"id":41293752,"uuid":"85679179","full_name":"Oefenweb/python-untraceables","owner":"Oefenweb","description":"Randomizes IDs for a given set of tables making them untraceable across environments","archived":false,"fork":false,"pushed_at":"2024-12-17T09:45:34.000Z","size":49,"stargazers_count":1,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-29T03:41:39.055Z","etag":null,"topics":["anonymize","data","database","mysql","privacy","python","python2","python3","randomization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Oefenweb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-21T08:51:41.000Z","updated_at":"2024-12-17T09:45:36.000Z","dependencies_parsed_at":"2022-09-01T14:02:54.816Z","dependency_job_id":null,"html_url":"https://github.com/Oefenweb/python-untraceables","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oefenweb%2Fpython-untraceables","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oefenweb%2Fpython-untraceables/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oefenweb%2Fpython-untraceables/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oefenweb%2Fpython-untraceables/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Oefenweb","download_url":"https://codeload.github.com/Oefenweb/python-untraceables/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249183058,"owners_count":21226139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymize","data","database","mysql","privacy","python","python2","python3","randomization"],"created_at":"2024-11-22T15:21:09.385Z","updated_at":"2026-02-03T20:36:53.142Z","avatar_url":"https://github.com/Oefenweb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# untraceables\n\n[![Build Status](https://travis-ci.org/Oefenweb/python-untraceables.svg)](https://travis-ci.org/Oefenweb/python-untraceables)\n\n`python-untraceables` provides some tools to randomize IDs for a given set of tables making them untraceable across environments.\n\n## Requirements\n\n* Python 2.7\n* Python 3.5\n* Python 3.6\n\n## Usage\n\n### Setup\n\n#### Untraceables user, database and mapping table\n\nCreate a `untraceables` user with sufficient permissions.\n\n```sql\nCREATE USER 'untraceables'@'localhost' IDENTIFIED BY 'mmRXHqnc3zSshYjxSv8n';\nCREATE DATABASE untraceables;\nGRANT SELECT ON untraceables . * TO 'untraceables'@'localhost';\n```\n\n```sql\nGRANT ALL PRIVILEGES ON example_com_www . * TO 'untraceables'@'localhost';\nFLUSH PRIVILEGES;\n```\n\nLet's say we want to randomize IDs for our `users` table.\n\n```sql\nUSE `untraceables`;\nDROP TABLE IF EXISTS `users`;\nCREATE TABLE `users` (\n  `id` int(10) unsigned NOT NULL,\n  `mapped_id` int(10) unsigned NOT NULL,\n  PRIMARY KEY (`id`),\n  UNIQUE KEY `mapped_id` (`mapped_id`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8;\n```\n\n#### Configuration\n\nCreate the necessary configuration file.\n\n```sh\n# cat /etc/untraceables.cfg\n[main]\n# Database host\nhost = localhost\n\n# Database user, read-only on untraceables database, write on databases that need randomized IDs\nuser = untraceables\npassword = mmRXHqnc3zSshYjxSv8n\n```\n\n#### Generate mapping table data\n\nGenerate a file containing the full unsinged integer range.\n\n```sh\n# 2^32 - 1 = 4294967295\nseq -f '%.0f' 0 4294967295 \u003e unsinged-int;\n```\n\nShuffle the file. **Note** that this may need a lot om RAM. `unsinged-int` is about `40G`, RAM usage for the shuffle was `107G`.\n\n```sh\nshuf \u003c unsinged-int \u003e unsinged-int.shuf;\n```\n\nCombine the ordered unsinged integer range with the shuffled one. Also split the output in pieces of `10^6` IDs.\n\n```sh\npaste unsinged-int unsinged-int.shuf | split --numeric-suffixes -l 1000000 - unsinged-int.csv-;\n```\n\nCompress all pieces\n\n```sh\nls unsinged-int.csv-* | parallel -j 12 'gzip {}';\n```\n\n#### Load mapping table data\n\nLoad the first piece of compressed data into MySQL.\n\nIn one terminal\n\n```sh\nmkfifo --mode=0600 users.csv;\ngunzip \u003c unsinged-int.csv-00.gz \u003e users.csv;\n```\n\nIn another\n\n```sh\nmysqlimport --fields-terminated-by='\\t' --ignore-lines=0 --local untraceables users.csv;\n```\n\nThis loads `unsinged-int.csv-00.gz` into `untraceables`.`users`.\n\n### Commands\n\n#### get-include-from-mydumper-backup\n\nGenerates a list of include regexes, from a given mydymper backup, to be used as imput for `get-table-list`.\n\n```sh\nbin/randomize-ids get-include-from-mydumper-backup \\\n  -d example_com_www \\\n  -p ~/backups/latest/ \\\n  -i '^users\\.id$' \\\n  -i '^.*\\.user_id$' \\\n  -i '^.*\\..*user_id$' \\\n  \u003e /tmp/include-from \\\n;\n```\n\n#### get-table-list\n\nGets a list of tables and columns filtered by one or more include / exclude regexes for a given database.\n\n```sh\nbin/randomize-ids get-table-list \\\n  -d example_com_www \\\n  -i '^users\\.id$' \\\n  -i '^.*\\.user_id$' \\\n  -i '^.*\\..*user_id$' \\\n  -e '^user_application_x_properties\\.x_user_id$' \\\n;\n```\n\nor\n\n```sh\nbin/randomize-ids get-table-list -d example_com_www \\\n  --include-from /tmp/include-from \\\n  -e '^user_application_x_properties\\.x_user_id$' \\\n;\n```\n\nExample output.\n\n```sh\nexample_com_www\taudit_trails\tuser_id\nexample_com_www\ttickets\tassigned_user_id\nexample_com_www\tusers\tid\n```\n\n#### get-sql\n\nGets `SQL` statements to randomize the IDs of a given database and table.\n\n```sh\nbin/randomize-ids get-sql \\\n  -d example_com_www \\\n  -t users \\\n  -c id \\\n;\n```\n\nor\n\n```sh\nbin/randomize-ids get-sql \\\n  -d example_com_www \\\n  -t users \\\n  -c id \\\n  --mapping-database untraceables \\\n  --mapping-table users \\\n;\n```\n\nExample output.\n\n```sql\nDROP TABLE IF EXISTS `example_com_www`.`_users`;\nCREATE TABLE `example_com_www`.`_users` LIKE `example_com_www`.`users`;\nINSERT INTO `example_com_www`.`_users` SELECT `t2`.`mapped_id`, `t1`.`username`, `t1`.`password`, `t1`.`active`, `t1`.`first_name`, `t1`.`last_name`, `t1`.`created`, `t1`.`modified` FROM `example_com_www`.`users` `t1` LEFT JOIN `untraceables`.`users` `t2` ON `t2`.`id` = `t1`.`id`;\nDROP TABLE `example_com_www`.`users`;\nRENAME TABLE `example_com_www`.`_users` TO `example_com_www`.`users`;\n```\n\n#### run-sql\n\nRuns `SQL` statements from `STDIN`.\n\n```\necho \"INSERT INTO example_com_www (a, b, c) VALUES (1, 2, 3)\" | \\\n  bin/randomize-ids run-sql \\\n    -d example_com_www \\\n;\n```\n\nor\n\n```\necho \"INSERT INTO example_com_www (a, b, c) VALUES (1, 2, 3)\" | \\\n  bin/randomize-ids run-sql \\\n    -d example_com_www \\\n    --no-foreign-key-checks \\\n;\n```\n\n#### All chained together\n \n```sh\nbin/randomize-ids get-table-list \\\n  -d example_com_www \\\n  -i '^users\\.id$' -i '^.*\\.user_id$' \\\n  -i '^.*\\..*user_id$' \\\n  -e '^user_application_x_properties\\.x_user_id$' | \\\n  awk '{print \"bin/randomize-ids get-sql\" \" -d \" $1 \" -t \" $2 \" -c \" $3 \" --mapping-table users;\" }' | \\\n  bash -e -o pipefail | \\\n  bin/randomize-ids run-sql \\\n    -d example_com_www \\\n    --no-foreign-key-checks \\\n;\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foefenweb%2Fpython-untraceables","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foefenweb%2Fpython-untraceables","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foefenweb%2Fpython-untraceables/lists"}