{"id":13595267,"url":"https://github.com/datanymizer/datanymizer","last_synced_at":"2025-04-09T10:33:21.073Z","repository":{"id":37041307,"uuid":"315741354","full_name":"datanymizer/datanymizer","owner":"datanymizer","description":"Powerful database anonymizer with flexible rules. Written in Rust.","archived":false,"fork":false,"pushed_at":"2024-08-28T11:02:54.000Z","size":1210,"stargazers_count":515,"open_issues_count":29,"forks_count":29,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-11-06T17:46:55.347Z","etag":null,"topics":["database","database-anonymizer","database-dump","dump-data","dumper","fake-data","fake-generator","hacktoberfest","postgresql-database"],"latest_commit_sha":null,"homepage":"https://datanymizer.github.io/docs/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datanymizer.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-24T20:10:26.000Z","updated_at":"2024-11-03T18:49:51.000Z","dependencies_parsed_at":"2024-06-19T02:54:45.851Z","dependency_job_id":"0d60b83c-c5e7-4d3e-94ba-09a8a10bc5d0","html_url":"https://github.com/datanymizer/datanymizer","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datanymizer%2Fdatanymizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datanymizer%2Fdatanymizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datanymizer%2Fdatanymizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datanymizer%2Fdatanymizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datanymizer","download_url":"https://codeload.github.com/datanymizer/datanymizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248020593,"owners_count":21034459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","database-anonymizer","database-dump","dump-data","dumper","fake-data","fake-generator","hacktoberfest","postgresql-database"],"created_at":"2024-08-01T16:01:46.675Z","updated_at":"2025-04-09T10:33:16.064Z","avatar_url":"https://github.com/datanymizer.png","language":"Rust","funding_links":[],"categories":["Rust","Development tools","开发工具 Development tools"],"sub_categories":["Web Servers","Web服务器 Web Servers","Workflow Automation"],"readme":"# [Data]nymizer\n\n\u003cimg align=\"right\" \n     alt=\"datanymizer\"\n     src=\"https://raw.githubusercontent.com/datanymizer/datanymizer/master/logo.png\"\u003e\n\n[![Build Status](https://github.com/datanymizer/datanymizer/actions/workflows/ci.yml/badge.svg)](https://github.com/datanymizer/datanymizer/actions/workflows/ci.yml)\n![License](https://img.shields.io/github/license/datanymizer/datanymizer)\n![Release Version](https://img.shields.io/github/v/release/datanymizer/datanymizer)\n[![CodeCov](https://codecov.io/gh/datanymizer/datanymizer/branch/main/graph/badge.svg)](https://codecov.io/gh/datanymizer/datanymizer)\n[![Audit](https://github.com/datanymizer/datanymizer/actions/workflows/audit.yml/badge.svg)](https://github.com/datanymizer/datanymizer/actions/workflows/audit.yml)\n\nPowerful database anonymizer with flexible rules. Written in Rust.\n\nDatanymizer is created \u0026 [supported by Evrone](https://evrone.com/?utm_campaign=datanymizer). See what else we [develop with Rust](https://evrone.com/rust?utm_source=github\u0026utm_campaign=datanymizer).\n\nMore information you can find in articles in [English](https://evrone.com/datanymizer?utm_source=github\u0026utm_campaign=datanymizer) and [Russian](https://evrone.ru/datanymizer?utm_source=github\u0026utm_campaign=datanymizer).\n\n## How it works\n\nDatabase -\u003e Dumper (+Faker) -\u003e Dump.sql\n\nYou can import or process your dump with supported database without 3rd-party importers.\n\nDatanymizer generates database-native dump.\n\n## Installation\n\nThere are several ways to install `pg_datanymizer`, choose a more convenient option for you.\n\n### Pre-compiled binary\n\n```bash\n# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default\n$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh | sh -s\n\n# Or more shorter way\n$ curl -sSfL https://git.io/pg_datanymizer | sh -s\n\n# Specify installation directory and version\n$ curl -sSfL https://git.io/pg_datanymizer | sudo sh -s -- -b /usr/local/bin v0.2.0\n\n# Alpine Linux (wget)\n$ wget -q -O - https://git.io/pg_datanymizer | sh -s\n```\n\n#### Homebrew / Linuxbrew\n\n```bash\n# Installs the latest stable release\n$ brew install datanymizer/tap/pg_datanymizer\n\n# Builds the latest version from the repository\n$ brew install --HEAD datanymizer/tap/pg_datanymizer\n```\n\n#### Docker\n\n```bash\n$ docker run --rm -v `pwd`:/app -w /app datanymizer/pg_datanymizer\n```\n\n## Getting started with CLI dumper\n\nFirst, inspect your database schema, choose fields with sensitive data, and create a config file based on it.\n\n```yaml\n# config.yml\ntables:\n  - name: markets\n    rules:\n      name_translations:\n        template:\n          format: '{\"en\": \"{{_1}}\", \"ru\": \"{{_2}}\"}'\n          rules:\n            - words:\n                min: 1\n                max: 2\n            - words:\n                min: 1\n                max: 2\n  - name: franchisees\n    rules:\n      operator_mail:\n        template:\n          format: user-{{_1}}-{{_2}}\n          rules:\n            - random_num: {}\n            - email:\n                kind: Safe\n      operator_name:\n        first_name: {}\n      operator_phone:\n        phone:\n          format: +###########\n      name_translations:\n        template:\n          format: '{\"en\": \"{{_1}}\", \"ru\": \"{{_2}}\"}'\n          rules:\n            - words:\n                min: 2\n                max: 3\n            - words:\n                min: 2\n                max: 3\n  - name: users\n    rules:\n      first_name:\n        first_name: {}\n      last_name:\n        last_name: {}\n  - name: customers\n    rules:\n      email:\n        template:\n          format: user-{{_1}}-{{_2}}\n          rules:\n            - random_num: {}\n            - email:\n                kind: Safe\n                uniq:  \n                  required: true\n                  try_count: 5\n      phone:\n        phone:\n          format: +7##########\n          uniq: true\n      city:\n        city: {}\n      age:\n        random_num:\n          min: 10\n          max: 99\n      first_name:\n        first_name: {}\n      last_name:\n        last_name: {}\n      birth_date:\n        datetime:\n          from: 1990-01-01T00:00:00+00:00\n          to: 2010-12-31T00:00:00+00:00\n```\n\nAnd then start to make dump from your database instance:\n\n```bash\npg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:postgres@localhost/test_database\n```\n\nIt creates new dump file `/tmp/dump.sql` with native SQL dump for Postgresql database.\nYou can import fake data from this dump into new Postgresql database with command:\n\n```bash\npsql -U postgres -d new_database \u003c /tmp/dump.sql\n```\n\nDumper can stream dump to `STDOUT` like `pg_dump` and you can use it in other pipelines:\n\n```bash\npg_datanymizer -c ./config.yml postgres://postgres:postgres@localhost/test_database \u003e /tmp/dump.sql\n```\n\n\n## Additional options\n\n### Tables filter\n\nYou can specify which tables you choose or ignore for making dump.\n\nFor dumping only `public.markets` and `public.users` data.\n\n```yaml\n# config.yml\n#...\nfilter:\n  only:\n    - public.markets\n    - public.users\n```\n\nFor ignoring those tables and dump data from others.\n\n```yaml\n# config.yml\n#...\nfilter:\n  except:\n    - public.markets\n    - public.users\n```\n\nYou can also specify data and schema filters separately.\n\nThis is equivalent to the previous example.\n\n```yaml\n# config.yml\n#...\nfilter:\n  data:\n    except:\n      - public.markets\n      - public.users\n```\n\nFor skipping schema and data from other tables.\n\n```yaml\n# config.yml\n#...\nfilter:\n  schema:\n    only:\n      - public.markets\n      - public.users\n```\n\nFor skipping schema for `markets` table and dumping data only from `users` table.\n\n```yaml\n# config.yml\n#...\nfilter:\n  data:\n    only:\n      - public.users\n  schema:\n    except:\n      - public.markets\n```\n\nYou can use wildcards in the `filter` section:\n\n* `?` matches exactly one occurrence of any character;\n* `*` matches arbitrary many (including zero) occurrences of any character.\n\n### Dump conditions and limit\n\nYou can specify conditions (SQL `WHERE` statement) and limit for dumped data per table:\n\n```yaml\n# config.yml\ntables:\n  - name: people\n    query:\n      # don't dump some rows\n      dump_condition: \"last_name \u003c\u003e 'Sensitive'\"\n      # select maximum 100 rows\n      limit: 100 \n```\n\n### Transform conditions and limit\n\nAs the additional option, you can specify SQL conditions that define which rows will be transformed (anonymized):\n\n```yaml\n# config.yml\ntables:\n  - name: people\n    query:\n      # don't dump some rows\n      dump_condition: \"last_name \u003c\u003e 'Sensitive'\"\n      # preserve original values for some rows\n      transform_condition: \"NOT (first_name = 'John' AND last_name = 'Doe')\"      \n      # select maximum 100 rows\n      limit: 100\n```\n\nYou can use the `dump_condition`, `transform_condition` and `limit` options in any combination (only\n`transform_condition`; `transform_condition` and `limit`; etc).\n\n### Global variables\n\nYou can specify global variables available from any `template` rule.\n\n```yaml\n# config.yml\ntables:\n  users:\n    bio:\n      template:\n        format: \"User bio is {{var_a}}\"\n    age:\n      template:\n        format: {{_0 | float * global_multiplicator}}\n#...\nglobals:\n  var_a: Global variable 1\n  global_multiplicator: 6\n```\n\n## Available rules\n\n| Rule                           | Description                                                                  |\n|--------------------------------|------------------------------------------------------------------------------|\n| `email`                        | Emails with different options                                                |\n| `ip`                           | IP addresses. Supports IPv4 and IPv6                                         |\n| `words`                        | Lorem words with different length                                            |\n| `first_name`                   | First name generator                                                         |\n| `last_name`                    | Last name generator                                                          |\n| `city`                         | City names generator                                                         |\n| `phone`                        | Generate random phone with different `format`                                |\n| `pipeline`                     | Use pipeline to generate more complicated values                             |\n| `capitalize`                   | Like filter, it capitalizes input value                                      |\n| `template`                     | Template engine for generate random text with included rules                 |\n| `digit`                        | Random digit (in range `0..9`)                                               |\n| `random_num`                   | Random number with `min` and `max` options                                   |\n| `password`                     | Password with different \u003cbr\u003elength options (support `max` and `min` options) |\n| `datetime`                     | Make DateTime strings with options (`from` and `to`)                         |\n| more than 70 rules in total... |                                                                              |\n\nFor the complete list of rules please refer [this document](docs/transformers.md).\n\n### Uniqueness\n\nYou can specify that result values must be unique (they are not unique by default).\nYou can use short or full syntax.\n\nShort:\n```yaml\nuniq: true\n```\n\nFull:\n```yaml\nuniq:\n  required: true\n  try_count: 5\n```\n\nUniqueness is ensured by re-generating values when they are same.\nYou can customize the number of attempts with `try_count` (this is an optional field, the default number of tries\ndepends on the rule).\n\nCurrently, uniqueness is supported by: `email`, `ip`, `phone`, `random_num`.\n\n### Locales\n\nYou can specify the locale for individual rules:\n\n```yaml\nfirst_name:\n  locale: RU\n```\n\nThe default locale is `EN` but you can specify a different default locale:\n\n```yaml\ntables:\n  # ........  \ndefault:\n  locale: RU\n```\n\nWe also support `ZH_TW` (traditional chinese) and `RU` (translation in progress).\n\n## Referencing row values from templates\n\nYou can reference values of other row fields in templates.\nUse `prev` for original values and `final` - for anonymized:\n\n```yaml\ntables:\n  - name: some_table\n    # You must specify the order of rule execution when using `final`\n    rule_order:\n      - greeting\n      - options\n    rules:\n      first_name:\n        first_name: {}\n      greeting:\n        template:\n          # Keeping the first name, but anonymizing the last name   \n          format: \"Hello, {{ prev.first_name }} {{ final.last_name }}!\"\n      options:\n        template:\n          # Using the anonymized value again   \n          format: \"{greeting: \\\"{{ final.greeting }}\\\"}\"\n```\n\nYou must specify the order of rule execution when using `final` with `rule_order`.\nAll rules not listed will be placed at the beginning (i.e. you must list only rules with `final`).\n\n## Sharing information between rows\n\nWe implemented a built-in key-value store that allows information to be exchanged between anonymized rows.\n\nIt is available via the special functions in templates.\n\nTake a look at an example:\n\n```yaml\ntables:\n  - name: users  \n    rules:\n      name:\n        template:    \n          # Save a name to the store as a side effect, the key is `user_names.\u003cUSER_ID\u003e` \n          format: \"{{ _1 }}{{ store_write(key='user_names.' ~ prev.id, value=_1) }}\"\n          rules:\n            - person_name: {}\n  - name: user_operations\n    rules:\n      user_name:          \n        template:\n          # Using the saved value again  \n          format: \"{{ store_read(key='user_names.' ~ prev.user_id) }}\"\n```\n\n## Supported databases\n\n- [x] Postgresql\n- [ ] MySQL or MariaDB (TODO)\n\n## Documentation\n\n* [pg_datanymizer](docs/pg_datanymizer.md) CLI application manual.\n* [config.yml](docs/config.md) file specification.\n* [Full list](docs/transformers.md) of transformation rules.\n* [Integration testing](docs/integration_tests.md) manual.\n\n## Sponsors\n\n\u003cp\u003e\n  \u003ca href=\"https://evrone.com/?utm_source=github\u0026utm_campaign=datanymizer\"\u003e\n    \u003cimg src=\"https://camo.githubusercontent.com/433f193098927e4e7229c229c8920f77898282063d4fc3cbafb04ea3d24d73df/68747470733a2f2f6576726f6e652e636f6d2f6c6f676f2f6576726f6e652d73706f6e736f7265642d6c6f676f2e706e67\"\n      alt=\"Sponsored by Evrone\" width=\"210\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n## License\n\n[MIT](https://github.com/datanymizer/datanymizer/blob/main/LICENSE)\n\n## Development\n\n### Cross compilation\n\nMac to Linux\n\n```\nrustup target add x86_64-unknown-linux-gnu\nbrew tap messense/macos-cross-toolchains\nbrew install x86_64-unknown-linux-gnu\nCARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc cargo build --target x86_64-unknown-linux-gnu --release --features openssl/vendored\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatanymizer%2Fdatanymizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatanymizer%2Fdatanymizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatanymizer%2Fdatanymizer/lists"}