{"id":22903934,"url":"https://github.com/tracktor/padmy","last_synced_at":"2025-10-26T21:44:01.584Z","repository":{"id":129325325,"uuid":"527562591","full_name":"Tracktor/padmy","owner":"Tracktor","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-25T11:15:25.000Z","size":334,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T11:31:20.752Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tracktor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-22T12:50:14.000Z","updated_at":"2025-03-25T11:15:28.000Z","dependencies_parsed_at":"2024-03-05T18:28:11.130Z","dependency_job_id":"48bca7eb-d3ef-4498-b416-d362b826a270","html_url":"https://github.com/Tracktor/padmy","commit_stats":null,"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tracktor%2Fpadmy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tracktor%2Fpadmy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tracktor%2Fpadmy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tracktor%2Fpadmy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tracktor","download_url":"https://codeload.github.com/Tracktor/padmy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253112364,"owners_count":21856124,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-14T02:39:33.177Z","updated_at":"2025-10-26T21:43:56.529Z","avatar_url":"https://github.com/Tracktor.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Padmy\n\nCLI utility functions for Postgresql such as **sampling** and **anonymization**.\n\n## Installation\n\nRun `poetry install`  to install the python packages.\n\n## 1. Database Exploration\n\nYou can get information about a database by running\n\n```bash\npoetry run cli analyze --db test --schemas test\n```\n\nor using the docker image\n\n```bash\n docker run -it \\\n   --network host \\\n   tracktor/padmy:latest analyze --db test --schemas test\n```\n\nFor instance, the following table definition will output:\n\n```sql\nCREATE TABLE table1\n(\n    id SERIAL PRIMARY KEY\n);\n\nCREATE TABLE table2\n(\n    id        SERIAL PRIMARY KEY,\n    table1_id INT REFERENCES table1\n);\n\nCREATE TABLE table3\n(\n    id        SERIAL PRIMARY KEY,\n    table1_id INT REFERENCES table1,\n    table2_id INT REFERENCES table2\n);\n\nCREATE TABLE table4\n(\n    id        SERIAL PRIMARY KEY,\n    table1_id INT REFERENCES table1\n);\nINSERT INTO table1(id)\nSELECT generate_series(0, 10);\n```\n\n**Default**\n\n![Network schema](./docs/explore-default.png)\n\n**Network Schema** (if `--show-graphs` is specified)\n\n![Network schema](./docs/explore-schema.png)\n\n## 2. Sampling\n\nYou can quickly sample (ie: take a subset) of a database by running\n\n```bash\npoetry run cli sample \\\n  --db test --to-db test-sampled \\\n  --sample 20 \\\n  --schemas public\n```\n\nThis will sample the `test` database into a new `test-sampled` database, copy of the\noriginal one, keeping if possible (see: [Annexe](#Known-limitations)) **20%** of the original database.\n\nYou can choose how to sample with more granularity by passing a configuration file.\nHere is an example:\n\n```yaml\n# We want a default sampling size of 20% of each table count\nsample: 20\n# We want to sample `schema_1` and `schema_2`\nschemas:\n  - schema_1\n  # We want a default size of 30% for the tables of this schema\n  - name: schema_2\n    sample: 30\n\ntables:\n  # We want a sample size of 10% for this table\n  - schema: public\n    table: table_3\n    sample: 10\n```\n\n## 3. Migration utils\n\n**Setting up**\n\nThis library includes a migration utility to help you evolve your data model.\nIn order to use it, start by setting up the migration table:\n\n```bash\npoetry run cli -vv migrate setup --db postgres\n```\n\nThis will create the `public.migration` table that stores all the migration / rollback that\nwill be applied.\n\n**Setting up the Schemas**\n\nNow that we are all setup, let's create our first sql file that will create the schema:\n\n```bash\npoetry run cli -vv migrate new-sql 1 --sql-dir /tmp/sql\n```  \n\nAdd `CREATE SCHEMA general;` to the file.\n\nThen apply the modifications to the database:\n\n```bash\npoetry run cli -vv migrate apply-sql --sql-dir /tmp/sql --db postgres \n```\n\nNotes:\nThis will run through all the files in the `/tmp/sql` folder (in order) run them.\nSql files here **need to be IDEMPOTENT**\n\n**Creating a first migration**\n\nNow, lets create our first migration:\n\n```bash\nmkdir -p /tmp/migrations # You can choose a different folder to store your migrations\npoetry run cli -vv migrate new --sql-dir /tmp/migrations\n```\n\nThis will create 2 new files:\n\n- **up**: `{timestamp}-{migration_id}-up.sql` that contains your\n  migration to apply to the database.\n- **down**: `{timestamp}-{migration_id}-down.sql` that contains the code to revert your changes.\n\nLet's now modify the `up.sql` file with:\n\n```sql\nCREATE TABLE IF NOT EXISTS general.test\n(\n    id  int primary key,\n    foo int\n);\n\nCREATE TABLE IF NOT EXISTS general.test2\n(\n    id  serial primary key,\n    foo text\n);\n```\n\nand check that the migration is valid:\n\n```bash\npoetry run cli -vv migrate verify --sql-dir /tmp/migrations\n``` \n\nBecause we did not add anything to the `down.sql` file, the command returns an error.\nLet's modify it to make the command pass:\n\n```sql\nDROP table general.test;\nDROP table general.test2;\n``` \n\n```bash\npoetry run cli -vv migrate verify --sql-dir /tmp/migrations\n``` \n\nWe are all good !\n\n**Optional**: You can also verify that the order of the migration is correct by running:\n\n```bash\npoetry run cli -vv migrate verify-files --sql-dir /tmp/migrations --no-raise\n```\n\n## 4. Comparing databases schemas\n\nYou can compare two databases by running:\n\n```bash\npoetry run cli -vv schema-diff --db tracktor --schemas schema_1,schema_2\n```\nIf differences are found, the command will output the differences between the two databases.\n\n\n### Known limitations\n\n**Exact sample size**\n\nSometimes, we cannot guaranty that the sampled table will have the exact\nexpected size.\n\nFor instance let's say we want **10%** of *table1* and **10%** of *table2*, given the following\ntable definitions:\n\n```sql\nCREATE TABLE table1\n(\n    id SERIAL PRIMARY KEY\n);\n\nCREATE TABLE table2\n(\n    id        SERIAL PRIMARY KEY,\n    table1_id INT NOT NULL REFERENCES table1\n);\n\nINSERT INTO table1(id)\nVALUES (1);\n\nINSERT INTO table2(table1_id)\nSELECT 1\nFROM generate_series(1, 10);\n```\n\nIn this case, it's not possible to have less that **100%** of table 1 since it has only 1 key on\nwhich depend all the `table1_id` rows of *table2*.\n\n**Cyclic foreign keys**\n\nCyclic foreign keys (table with a FK on another table that reference the previous one) are not supported.\nHere is an example.\n\n```sql\nCREATE TABLE table1\n(\n    id        SERIAL PRIMARY KEY,\n    table2_id INT NOT NULL\n);\n\nCREATE TABLE table2\n(\n    id        SERIAL PRIMARY KEY,\n    table1_id INT NOT NULL\n);\n\nALTER TABLE table1\n    ADD CONSTRAINT table1_table2_id_fk\n        FOREIGN KEY (table2_id) REFERENCES table2;\n\nALTER TABLE table2\n    ADD CONSTRAINT table2_table1_id_fk\n        FOREIGN KEY (table1_id) REFERENCES table1;\n```\n\n![Cyclic dependencies](./docs/cyclic-deps.png)\n\nYou can display cycling dependencies in a database by running:\n\n```bash\npoetry run cli -vv analyze --db test --schemas test --show-graph\n```\n\n(**Note::** you'll need to have installed the `network` extra )\n\n**Self referencing foreign keys**\n\nForeign keys referencing another column in the same table are ignored.\n\n```sql\nCREATE TABLE table1\n(\n    id        SERIAL PRIMARY KEY,\n    parent_id INT REFERENCES table1\n);\n```\n\n# Annexes\n\n## Showing Network in Jupyter\n\nYou can display the network visualization in Jupyter using [jupyter_dash]()\n\n```python\nfrom jupyter_dash import JupyterDash\nfrom padmy.sampling import network, viz, sampling\nimport asyncpg\n\nPG_URL = 'postgresql://postgres:postgres@localhost:5432/test'\n\napp = JupyterDash(__name__)\n\ndb = sampling.Database(name='test')\n\nasync with asyncpg.create_pool(PG_URL) as pool:\n    await db.explore(pool, ['public'])\n\ng = network.convert_db(db)\n\napp.layout = viz.get_layout(g,\n                            style={'width': '100%', 'height': '800px'},\n                            layout='klay')\n\napp.run_server(mode='jupyterlab')  # or mode='inline'\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftracktor%2Fpadmy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftracktor%2Fpadmy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftracktor%2Fpadmy/lists"}