https://github.com/math280h/redactdump
Database dumps with support for redacting/replacing data
https://github.com/math280h/redactdump
database-dump database-tool dump mysql postgres postgresql redaction
Last synced: about 2 months ago
JSON representation
Database dumps with support for redacting/replacing data
- Host: GitHub
- URL: https://github.com/math280h/redactdump
- Owner: math280h
- License: apache-2.0
- Created: 2021-12-07T13:38:39.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-21T19:43:52.000Z (about 1 year ago)
- Last Synced: 2025-04-03T17:50:30.774Z (about 2 months ago)
- Topics: database-dump, database-tool, dump, mysql, postgres, postgresql, redaction
- Language: Python
- Homepage:
- Size: 749 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![]()
Easily create database dumps with support for redacting data (And replacing that data with valid random values).
**Supported databases**
* MySQL
* PostgreSQL_More coming soon..._
## Installation
To install redactdump, run the following command:
````shell
pip install redactdump
````## Usage
```shell
usage: redactdump [-h] -c CONFIGredactdump
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Path to dump configuration.
-u USER, --user USER Connection username.
-p PASSWORD, --password PASSWORD
Connection password.
-d DEBUG, --debug DEBUG
Enable debug mode.
```## Configuration
To create a dump you currently must use a configuration file, however in the future you might be able to do it all via CLI.
### Supported replacement values.
redactdump uses faker to generate random data.
`replacement` can therefore be any function from the following providers:
https://faker.readthedocs.io/en/stable/providers.html**NOTE: redactdump is currently NOT tested with all providers, some might trigger bugs**
### Example configuration:
````yaml
connection:
type: pgsql
host: 127.0.0.1
port: 5432
database: postgresredact:
patterns:
column:
- pattern: '^[a-zA-Z]+_name'
replacement: name
data:
- pattern: '192.168.0.1'
replacement: ipv4
- pattern: 'John Doe'
replacement: nameoutput:
type: multi_file
naming: 'dump-[table_name]-[timestamp]' # Default: [table_name]-[timestamp]
location: './output/'
````### Configuration Schema
The configuration schema can be found [here](redactdump/core/config.py)
## Example
Configuration
```yaml
connection:
type: pgsql
host: 127.0.0.1
port: 5432
database: postgresredact:
patterns:
column:
- pattern: '^new_'
replacement: name
data:
- pattern: '6'
replacement: random_intoutput:
type: multi_file
naming: 'dump-[table_name]-[timestamp]' # Default: [table_name]-[timestamp]
location: './output/'
```Original data
_(column_1, new_column)_
```text
6,"""John Doe"""
6,"John Doe"
6,"John Doe"
6,John Doe
1,\John Doe
1,--John Doe
12312, John Doe
99,!John Doe
99,(John Doe)
```Output
```sql
INSERT INTO table_name VALUES (890, 'Yolanda Mcdonald');
INSERT INTO table_name VALUES (1982, 'Stephen Lewis');
INSERT INTO table_name VALUES (2952, 'Janet Woodward');
INSERT INTO table_name VALUES (9307, 'Joshua Price');
INSERT INTO table_name VALUES (1, 'Tina Morrison');
INSERT INTO table_name VALUES (1, 'Juan Mejia');
INSERT INTO table_name VALUES (12312, 'Michael Thornton');
INSERT INTO table_name VALUES (99, 'Adrian White');
INSERT INTO table_name VALUES (99, 'Robin Jefferson');
```## Known limitations
### Data types not supported
* box
* bytea
* inet
* interval
* circle
* cidr
* line
* lseg
* macaddr
* macaddr8
* pg_lsn
* pg_snapshot
* point
* polygon
* tsquery
* tsvector
* txid_snapshot