Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rheinwerk-verlag/pganonymize
A commandline tool for anonymizing PostgreSQL databases
https://github.com/rheinwerk-verlag/pganonymize
anonymization anonymizer cli command-line command-line-tool database developer-tools dsgvo gdpr postgresql privacy python python2 python27 python3
Last synced: 30 days ago
JSON representation
A commandline tool for anonymizing PostgreSQL databases
- Host: GitHub
- URL: https://github.com/rheinwerk-verlag/pganonymize
- Owner: rheinwerk-verlag
- License: other
- Created: 2019-12-18T06:32:40.000Z (almost 5 years ago)
- Default Branch: development
- Last Pushed: 2024-07-18T06:36:29.000Z (4 months ago)
- Last Synced: 2024-10-07T22:27:30.811Z (about 1 month ago)
- Topics: anonymization, anonymizer, cli, command-line, command-line-tool, database, developer-tools, dsgvo, gdpr, postgresql, privacy, python, python2, python27, python3
- Language: Python
- Homepage: http://pganonymize.readthedocs.io/
- Size: 730 KB
- Stars: 41
- Watchers: 10
- Forks: 26
- Open Issues: 3
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.rst
Awesome Lists containing this project
README
pganonymize
===========A commandline tool to anonymize PostgreSQL databases for DSGVO/GDPR purposes.
It uses a YAML file to define which tables and fields should be anonymized and provides various methods of
anonymization. The tool requires a direct PostgreSQL connection to perform the anonymization... class:: no-web no-pdf
|python| |license| |pypi| |downloads| |build| |codecov| |health|
.. figure:: https://raw.githubusercontent.com/rheinwerk-verlag/pganonymize/main/docs/_static/demo.gif
:width: 100%.. contents::
Features
--------* Intentionally compatible with Python 2.7 (for old, productive platforms)
* Anonymize PostgreSQL tables on data level entry with various providers (some examples in the table below)
* Exclude data for anonymization depending on regular expressions or SQL ``WHERE`` clauses
* Truncate entire tables for unwanted data+----------------+----------------------+-------------------------+----------------------------------+
| Field | Value | Provider | Output |
+================+======================+=========================+==================================+
| ``first_name`` | John | ``choice`` | (Bob|Larry|Lisa) |
+----------------+----------------------+-------------------------+----------------------------------+
| ``title`` | Dr. | ``clear`` | |
+----------------+----------------------+-------------------------+----------------------------------+
| ``street`` | Irving St | ``faker.street_name`` | Miller Station |
+----------------+----------------------+-------------------------+----------------------------------+
| ``password`` | dsf82hFxcM | ``mask`` | XXXXXXXXXX |
+----------------+----------------------+-------------------------+----------------------------------+
| ``credit_card``| 1234-567-890 | ``partial_mask`` | 1??????????0 |
+----------------+----------------------+-------------------------+----------------------------------+
| ``email`` | [email protected] | ``md5`` | 0cba00ca3da1b283a57287bcceb17e35 |
+----------------+----------------------+-------------------------+----------------------------------+
| ``email`` | [email protected] | ``faker.unique.email`` | [email protected] |
+----------------+----------------------+-------------------------+----------------------------------+
| ``phone_num`` | 65923473 | ``md5`` as_number: True | 3948293448 |
+----------------+----------------------+-------------------------+----------------------------------+
| ``ip`` | 157.50.1.20 | ``set`` | 127.0.0.1 |
+----------------+----------------------+-------------------------+----------------------------------+
| ``uuid_col`` | 00010203-0405-...... | ``uuid4`` | f7c1bd87-4d.... |
+----------------+----------------------+-------------------------+----------------------------------+* Note: ``faker.unique.[provider]`` only supported on Python 3.6+ (Faker library min. supported python version)
* Note: ``uuid4`` - only for (native `uuid4`_) columnsSee the `documentation`_ for a more detailed description of the provided anonymization methods.
Installation
------------The default installation method is to use ``pip``:
.. code-block::
$ pip install pganonymize
Usage
-----.. code-block::
usage: pganonymize [-h] [-v] [-l] [--schema SCHEMA] [--dbname DBNAME]
[--user USER] [--password PASSWORD] [--host HOST]
[--port PORT] [--dry-run] [--dump-file DUMP_FILE]Anonymize data of a PostgreSQL database
optional arguments:
-h, --help show this help message and exit
-v, --verbose Increase verbosity
-l, --list-providers Show a list of all available providers
--schema SCHEMA A YAML schema file that contains the anonymization
rules
--dbname DBNAME Name of the database
--user USER Name of the database user
--password PASSWORD Password for the database user
--host HOST Database hostname
--port PORT Port of the database
--dry-run Don't commit changes made on the database
--dump-file DUMP_FILE
Create a database dump file with the given name
--dump-options DUMP_OPTIONS
Options to pass to the pg_dump command
--init-sql INIT_SQL SQL to run before starting anonymization
--parallel Data anonymization is done in parallelDespite the database connection values, you will have to define a YAML schema file, that includes
all anonymization rules for that database. Take a look at the `schema documentation`_ or the
`YAML sample schema`_.Example calls:
.. code-block::
$ pganonymize --schema=myschema.yml \
--dbname=test_database \
--user=username \
--password=mysecret \
--host=db.host.example.com \
-v$ pganonymize --schema=myschema.yml \
--dbname=test_database \
--user=username \
--password=mysecret \
--host=db.host.example.com \
--init-sql "set search_path to non_public_search_path; set work_mem to '1GB';" \
-vDatabase dump
~~~~~~~~~~~~~With the ``--dump-file`` argument it is possible to create a dump file after anonymizing the database. Please note,
that the ``pg_dump`` command from the ``postgresql-client-common`` library is necessary to create the dump file for the
database, e.g. under Linux:.. code-block::
$ sudo apt-get install postgresql-client-common
Example call:
.. code-block::
$ pganonymize --schema=myschema.yml \
--dbname=test_database \
--user=username \
--password=mysecret \
--host=db.host.example.com \
--dump-file=/tmp/dump.gz \
-vSo that the password for dumping does not have to be entered manually, it can also be entered as an environment var
``PGPASSWORD``:.. code-block::
$ PGPASSWORD=password pganonymize --schema=myschema.yml \
--dbname=test_database \
--user=username \
--password=mysecret \
--host=db.host.example.com \
--dump-file=/tmp/dump.gz \
-v.. warning::
Currently only the ``dump-file`` operation supports environment variables.
Docker
~~~~~~If you want to run the anonymizer within a Docker container you first have to build the image:
.. code-block::
$ docker build -t pganonymize .
After that you can pass a schema file to the container, using Docker volumes, and call the anonymizer:
.. code-block::
$ docker run \
-v :/schema.yml \
-it pganonymize \
/usr/local/bin/pganonymize \
--schema=/schema.yml \
--dbname= \
--user= \
--password= \
--host= \
-v.. _uuid4: https://www.postgresql.org/docs/current/datatype-uuid.html
.. _documentation: https://pganonymize.readthedocs.io/en/latest/
.. _schema documentation: https://pganonymize.readthedocs.io/en/latest/schema.html
.. _YAML sample schema: https://github.com/rheinwerk-verlag/pganonymize/blob/master/sample_schema.yml.. |python| image:: https://img.shields.io/pypi/pyversions/pganonymize
:alt: PyPI - Python Version.. |license| image:: https://img.shields.io/badge/license-MIT-green.svg
:target: https://github.com/rheinwerk-verlag/pganonymize/blob/master/LICENSE.rst.. |pypi| image:: https://badge.fury.io/py/pganonymize.svg
:target: https://badge.fury.io/py/pganonymize.. |downloads| image:: https://static.pepy.tech/personalized-badge/pganonymize?period=total&units=international_system&left_color=blue&right_color=black&left_text=Downloads
:target: https://pepy.tech/project/pganonymize
:alt: Download count.. |build| image:: https://github.com/rheinwerk-verlag/pganonymize/actions/workflows/test.yml/badge.svg
:target: https://github.com/rheinwerk-verlag/pganonymize/actions.. |codecov| image:: https://codecov.io/gh/rheinwerk-verlag/pganonymize/branch/main/graph/badge.svg
:target: https://codecov.io/gh/rheinwerk-verlag/pganonymize.. |health| image:: https://snyk.io/advisor/python/pganonymize/badge.svg
:target: https://snyk.io/advisor/python/pganonymize
:alt: pganonymize