Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datanymizer/datanymizer
Powerful database anonymizer with flexible rules. Written in Rust.
https://github.com/datanymizer/datanymizer
database database-anonymizer database-dump dump-data dumper fake-data fake-generator hacktoberfest postgresql-database
Last synced: 3 months ago
JSON representation
Powerful database anonymizer with flexible rules. Written in Rust.
- Host: GitHub
- URL: https://github.com/datanymizer/datanymizer
- Owner: datanymizer
- License: mit
- Created: 2020-11-24T20:10:26.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-08-28T11:02:54.000Z (5 months ago)
- Last Synced: 2024-08-28T16:54:04.742Z (5 months ago)
- Topics: database, database-anonymizer, database-dump, dump-data, dumper, fake-data, fake-generator, hacktoberfest, postgresql-database
- Language: Rust
- Homepage: https://datanymizer.github.io/docs/
- Size: 1.15 MB
- Stars: 505
- Watchers: 6
- Forks: 29
- Open Issues: 28
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-rust - datanymizer/datanymizer - Powerful database anonymizer with flexible rules [![build badge](https://github.com/datanymizer/datanymizer/workflows/CI/badge.svg?branch=main)](https://github.com/datanymizer/datanymizer/actions?query=workflow%3ACI+branch%3Amain) (Development tools / Web Servers)
- awesome-rust-cn - datanymizer/datanymizer - (开发工具 Development tools / Web服务器 Web Servers)
- awesome-rust - datanymizer/datanymizer - Powerful database anonymizer with flexible rules [![build badge](https://github.com/datanymizer/datanymizer/workflows/CI/badge.svg?branch=main)](https://github.com/datanymizer/datanymizer/actions?query=workflow%3ACI+branch%3Amain) (Development tools / Web Servers)
- fucking-awesome-rust - datanymizer/datanymizer - Powerful database anonymizer with flexible rules [![build badge](https://github.com/datanymizer/datanymizer/workflows/CI/badge.svg?branch=main)](https://github.com/datanymizer/datanymizer/actions?query=workflow%3ACI+branch%3Amain) (Development tools / Web Servers)
- fucking-awesome-rust - datanymizer/datanymizer - Powerful database anonymizer with flexible rules [![build badge](https://github.com/datanymizer/datanymizer/workflows/CI/badge.svg?branch=main)](https://github.com/datanymizer/datanymizer/actions?query=workflow%3ACI+branch%3Amain) (Development tools / Web Servers)
README
# [Data]nymizer
[![Build Status](https://github.com/datanymizer/datanymizer/actions/workflows/ci.yml/badge.svg)](https://github.com/datanymizer/datanymizer/actions/workflows/ci.yml)
![License](https://img.shields.io/github/license/datanymizer/datanymizer)
![Release Version](https://img.shields.io/github/v/release/datanymizer/datanymizer)
[![CodeCov](https://codecov.io/gh/datanymizer/datanymizer/branch/main/graph/badge.svg)](https://codecov.io/gh/datanymizer/datanymizer)
[![Audit](https://github.com/datanymizer/datanymizer/actions/workflows/audit.yml/badge.svg)](https://github.com/datanymizer/datanymizer/actions/workflows/audit.yml)Powerful database anonymizer with flexible rules. Written in Rust.
Datanymizer is created & [supported by Evrone](https://evrone.com/?utm_campaign=datanymizer). See what else we [develop with Rust](https://evrone.com/rust?utm_source=github&utm_campaign=datanymizer).
More information you can find in articles in [English](https://evrone.com/datanymizer?utm_source=github&utm_campaign=datanymizer) and [Russian](https://evrone.ru/datanymizer?utm_source=github&utm_campaign=datanymizer).
## How it works
Database -> Dumper (+Faker) -> Dump.sql
You can import or process your dump with supported database without 3rd-party importers.
Datanymizer generates database-native dump.
## Installation
There are several ways to install `pg_datanymizer`, choose a more convenient option for you.
### Pre-compiled binary
```bash
# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default
$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh | sh -s# Or more shorter way
$ curl -sSfL https://git.io/pg_datanymizer | sh -s# Specify installation directory and version
$ curl -sSfL https://git.io/pg_datanymizer | sudo sh -s -- -b /usr/local/bin v0.2.0# Alpine Linux (wget)
$ wget -q -O - https://git.io/pg_datanymizer | sh -s
```#### Homebrew / Linuxbrew
```bash
# Installs the latest stable release
$ brew install datanymizer/tap/pg_datanymizer# Builds the latest version from the repository
$ brew install --HEAD datanymizer/tap/pg_datanymizer
```#### Docker
```bash
$ docker run --rm -v `pwd`:/app -w /app datanymizer/pg_datanymizer
```## Getting started with CLI dumper
First, inspect your database schema, choose fields with sensitive data, and create a config file based on it.
```yaml
# config.yml
tables:
- name: markets
rules:
name_translations:
template:
format: '{"en": "{{_1}}", "ru": "{{_2}}"}'
rules:
- words:
min: 1
max: 2
- words:
min: 1
max: 2
- name: franchisees
rules:
operator_mail:
template:
format: user-{{_1}}-{{_2}}
rules:
- random_num: {}
- email:
kind: Safe
operator_name:
first_name: {}
operator_phone:
phone:
format: +###########
name_translations:
template:
format: '{"en": "{{_1}}", "ru": "{{_2}}"}'
rules:
- words:
min: 2
max: 3
- words:
min: 2
max: 3
- name: users
rules:
first_name:
first_name: {}
last_name:
last_name: {}
- name: customers
rules:
email:
template:
format: user-{{_1}}-{{_2}}
rules:
- random_num: {}
- email:
kind: Safe
uniq:
required: true
try_count: 5
phone:
phone:
format: +7##########
uniq: true
city:
city: {}
age:
random_num:
min: 10
max: 99
first_name:
first_name: {}
last_name:
last_name: {}
birth_date:
datetime:
from: 1990-01-01T00:00:00+00:00
to: 2010-12-31T00:00:00+00:00
```And then start to make dump from your database instance:
```bash
pg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:postgres@localhost/test_database
```It creates new dump file `/tmp/dump.sql` with native SQL dump for Postgresql database.
You can import fake data from this dump into new Postgresql database with command:```bash
psql -U postgres -d new_database < /tmp/dump.sql
```Dumper can stream dump to `STDOUT` like `pg_dump` and you can use it in other pipelines:
```bash
pg_datanymizer -c ./config.yml postgres://postgres:postgres@localhost/test_database > /tmp/dump.sql
```## Additional options
### Tables filter
You can specify which tables you choose or ignore for making dump.
For dumping only `public.markets` and `public.users` data.
```yaml
# config.yml
#...
filter:
only:
- public.markets
- public.users
```For ignoring those tables and dump data from others.
```yaml
# config.yml
#...
filter:
except:
- public.markets
- public.users
```You can also specify data and schema filters separately.
This is equivalent to the previous example.
```yaml
# config.yml
#...
filter:
data:
except:
- public.markets
- public.users
```For skipping schema and data from other tables.
```yaml
# config.yml
#...
filter:
schema:
only:
- public.markets
- public.users
```For skipping schema for `markets` table and dumping data only from `users` table.
```yaml
# config.yml
#...
filter:
data:
only:
- public.users
schema:
except:
- public.markets
```You can use wildcards in the `filter` section:
* `?` matches exactly one occurrence of any character;
* `*` matches arbitrary many (including zero) occurrences of any character.### Dump conditions and limit
You can specify conditions (SQL `WHERE` statement) and limit for dumped data per table:
```yaml
# config.yml
tables:
- name: people
query:
# don't dump some rows
dump_condition: "last_name <> 'Sensitive'"
# select maximum 100 rows
limit: 100
```### Transform conditions and limit
As the additional option, you can specify SQL conditions that define which rows will be transformed (anonymized):
```yaml
# config.yml
tables:
- name: people
query:
# don't dump some rows
dump_condition: "last_name <> 'Sensitive'"
# preserve original values for some rows
transform_condition: "NOT (first_name = 'John' AND last_name = 'Doe')"
# select maximum 100 rows
limit: 100
```You can use the `dump_condition`, `transform_condition` and `limit` options in any combination (only
`transform_condition`; `transform_condition` and `limit`; etc).### Global variables
You can specify global variables available from any `template` rule.
```yaml
# config.yml
tables:
users:
bio:
template:
format: "User bio is {{var_a}}"
age:
template:
format: {{_0 | float * global_multiplicator}}
#...
globals:
var_a: Global variable 1
global_multiplicator: 6
```## Available rules
| Rule | Description |
|--------------------------------|------------------------------------------------------------------------------|
| `email` | Emails with different options |
| `ip` | IP addresses. Supports IPv4 and IPv6 |
| `words` | Lorem words with different length |
| `first_name` | First name generator |
| `last_name` | Last name generator |
| `city` | City names generator |
| `phone` | Generate random phone with different `format` |
| `pipeline` | Use pipeline to generate more complicated values |
| `capitalize` | Like filter, it capitalizes input value |
| `template` | Template engine for generate random text with included rules |
| `digit` | Random digit (in range `0..9`) |
| `random_num` | Random number with `min` and `max` options |
| `password` | Password with different
length options (support `max` and `min` options) |
| `datetime` | Make DateTime strings with options (`from` and `to`) |
| more than 70 rules in total... | |For the complete list of rules please refer [this document](docs/transformers.md).
### Uniqueness
You can specify that result values must be unique (they are not unique by default).
You can use short or full syntax.Short:
```yaml
uniq: true
```Full:
```yaml
uniq:
required: true
try_count: 5
```Uniqueness is ensured by re-generating values when they are same.
You can customize the number of attempts with `try_count` (this is an optional field, the default number of tries
depends on the rule).Currently, uniqueness is supported by: `email`, `ip`, `phone`, `random_num`.
### Locales
You can specify the locale for individual rules:
```yaml
first_name:
locale: RU
```The default locale is `EN` but you can specify a different default locale:
```yaml
tables:
# ........
default:
locale: RU
```We also support `ZH_TW` (traditional chinese) and `RU` (translation in progress).
## Referencing row values from templates
You can reference values of other row fields in templates.
Use `prev` for original values and `final` - for anonymized:```yaml
tables:
- name: some_table
# You must specify the order of rule execution when using `final`
rule_order:
- greeting
- options
rules:
first_name:
first_name: {}
greeting:
template:
# Keeping the first name, but anonymizing the last name
format: "Hello, {{ prev.first_name }} {{ final.last_name }}!"
options:
template:
# Using the anonymized value again
format: "{greeting: \"{{ final.greeting }}\"}"
```You must specify the order of rule execution when using `final` with `rule_order`.
All rules not listed will be placed at the beginning (i.e. you must list only rules with `final`).## Sharing information between rows
We implemented a built-in key-value store that allows information to be exchanged between anonymized rows.
It is available via the special functions in templates.
Take a look at an example:
```yaml
tables:
- name: users
rules:
name:
template:
# Save a name to the store as a side effect, the key is `user_names.`
format: "{{ _1 }}{{ store_write(key='user_names.' ~ prev.id, value=_1) }}"
rules:
- person_name: {}
- name: user_operations
rules:
user_name:
template:
# Using the saved value again
format: "{{ store_read(key='user_names.' ~ prev.user_id) }}"
```## Supported databases
- [x] Postgresql
- [ ] MySQL or MariaDB (TODO)## Documentation
* [pg_datanymizer](docs/pg_datanymizer.md) CLI application manual.
* [config.yml](docs/config.md) file specification.
* [Full list](docs/transformers.md) of transformation rules.
* [Integration testing](docs/integration_tests.md) manual.## Sponsors
## License
[MIT](https://github.com/datanymizer/datanymizer/blob/main/LICENSE)
## Development
### Cross compilation
Mac to Linux
```
rustup target add x86_64-unknown-linux-gnu
brew tap messense/macos-cross-toolchains
brew install x86_64-unknown-linux-gnu
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc cargo build --target x86_64-unknown-linux-gnu --release --features openssl/vendored
```