{"id":20872377,"url":"https://github.com/romnn/postgresimporter","last_synced_at":"2025-08-20T09:07:54.892Z","repository":{"id":57454274,"uuid":"225543585","full_name":"romnn/postgresimporter","owner":"romnn","description":"A simple python wrapper script based on pgfutter to load multiple dumped csv files into a postgres database.","archived":false,"fork":false,"pushed_at":"2023-07-20T13:13:13.000Z","size":111,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-19T16:50:43.091Z","etag":null,"topics":["csv","database","import","ingestion","pgfutter","postgres"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/romnn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-03T06:05:16.000Z","updated_at":"2021-02-04T17:31:12.000Z","dependencies_parsed_at":"2022-08-29T15:00:23.068Z","dependency_job_id":null,"html_url":"https://github.com/romnn/postgresimporter","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fpostgresimporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fpostgresimporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fpostgresimporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fpostgresimporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/romnn","download_url":"https://codeload.github.com/romnn/postgresimporter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243247810,"owners_count":20260747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","database","import","ingestion","pgfutter","postgres"],"created_at":"2024-11-18T06:18:54.484Z","updated_at":"2025-03-12T15:40:36.586Z","avatar_url":"https://github.com/romnn.png","language":"Python","readme":"## postgresimporter\n\n[![Build Status](https://github.com/romnn/postgresimporter/workflows/test/badge.svg)](https://github.com/romnn/postgresimporter/actions)\n[![PyPI License](https://img.shields.io/pypi/l/postgresimporter)](https://pypi.org/project/postgresimporter/)\n[![PyPI Version](https://img.shields.io/pypi/v/postgresimporter)](https://pypi.org/project/postgresimporter/)\n[![PyPI Python versions](https://img.shields.io/pypi/pyversions/postgresimporter)](https://pypi.org/project/postgresimporter/)\n\nThis repository provides a python wrapper script based on [pgfutter](https://github.com/lukasmartinelli/pgfutter)\nto load dumped csv data into a `postgres` database. It exposes customization hooks\nand comes as a container or standalone script.\n\n#### Installation\n__Note__: If you want to use `docker`, skip installation and see below.\n```bash\npip install postgresimporter # using pip\npipx install postgresimporter # using pipx\n```\n\n#### Usage\n##### PIP\nIf you installed the python executable and already have a local postgres database running, run\n```bash\npostgresimporter \\\n    path/to/my/csv/files \\\n    --db-host=localhost \\\n    --db-port=5432 \\\n    --db-user=postgres \\\n    --db-password=example \\\n    --combine-tables \\\n    --exclude-regex=\"^.*sample.*$\" \\\n    --post-load path/to/my/hooks/post-load.sql\n```\n\n##### Docker\nThe same command when using the `docker` container looks like this:\n```bash\ndocker run \\\n    --network host \\\n    -v path/to/my/csv/files:/import \\\n    -v path/to/my/hooks/post-load.sql:/post-load.sql \\\n    -e DB_HOST=localhost \\\n    -e DB_PORT=5432 \\\n    -e DB_USER=postgres \\\n    -e DB_PASSWORD=example \\\n    romnn/postgresimporter \\\n    --post-load=/post-load.sql --combine-tables --exclude-regex=\"^.*sample.*$\" /import\n```\n_Note_: When using `docker`, environment variables (`-e`) must be used in favor of command \nline arguments for specifying database connection parameters.\n\nThe tools will scan the `sources` directory you specify for any `.zip` files and unzip them.\nAfterwards, it will scan for any `.csv` files and load them into a table named just like the \nfile. Afterwards, it will try to combine any tables with the same prefix. \n\n#### Usage\n\nSee `--help` for __Configuration options__.\n\nIf you want to spawn a complete setup including the loader, a `postgres` database and\n`pgadmin` as a postgres admin UI, you can use the provided `docker-compose` config:\n```bash\ndocker-compose -f deployment/postgresimporter.compose.yml -f deployment/postgres.compose.yml up\ndocker-compose -f deployment/postgresimporter.compose.yml -f deployment/postgres.compose.yml down\n```\nTo specify arguments for the `postgresimporter`, modify `deployment/postgresimporter.compose.yml`.\n\n**Notice**: Before using the provided database container, make sure to stop any already running instances of postgres.\nWhen using linux, do:\n```\nsudo /etc/init.d/postgresql stop\n```\n\n#### Hooks\nThe tool comes with some example hooks and the ability to add your own hooks scripts.\nYou might have a file `importdir/animals_1.csv` and `importdir/animals_2.csv` that looks like this:\n```\nname,origin,height\nGrizzly,\"North America\",220\nGiraffe,\"Africa\",600\nWallabie,\"Australia\",180\n```\nAfter importing `importdir/`, you will have three tables:\n\n\n| Table                 | Content                                                   |\n|:--------------------- |:----------------------------------------------------------|\n| `import.animals`      | `importdir/animals_1` and `importdir/animals_2` combined  |\n| `import.animals_1`    | All from `importdir/animals_1.csv`                        |\n| `import.animals_2`    | All from `importdir/animals_2.csv`                        |\n\nAll of these tables will have the schema defined by the csv file.\nHowever, all values will naturally be of type `text`.\nWith the `--post-load` you might want to execute a post load sql script that defines\na typed table and inserts the data like so:\n```postgresql\nCREATE TABLE public.animals (\n    name VARCHAR(200) PRIMARY KEY,\n    origin VARCHAR(200),\n    height INTEGER\n);\n\nINSERT INTO public.animals\nSELECT name, origin, height::int\nFROM import.animals\n```\n\n#### Configuration options\n| Option              | Description                   | Default | Required  |\n| --------------------|:------------------------------|---------|----------:|\n| `sources`           | List of csv files to load. Entries can either be directories or files. | None |yes |\n| `--disable-unzip`   | Disables unzipping of any `*.zip` archives in the source directory | False | no |\n| `--disable-import`  | Disables import of any `*.csv` files into the database | False | no |\n| `--disable-check`   | Disables checking csv row count and database row count after import | False | no |\n| `--combine-tables`  | Enabled combining of imported csv file tables into one table named by prefix (e.g. weather_1 \u0026 weather_2 -\u003e weather) | False | no |\n| `--exclude-regex`   | Files matching this regex will not be processed | None | no |\n| `--pre-load`        | List of `*.sql` scripts to be executed before importing into the database (e.g. to clean the database). Entries can either be directories or files. | None | no |\n| `--post-load`       | List of `*.sql` scripts to be executed after import (e.g. normalization). . Entries can either be directories or files. | None | no |\n| `--all`             | Unzip and import all archives and zip files again | False | no |\n| `--db-name`         | PostgreSQL database name | postgres | no |\n| `--db-host`         | PostgreSQL database host | localhost | no |\n| `--db-port`         | PostgreSQL database port | 5432 | no |\n| `--db-user`         | PostgreSQL database user | postgres | no |\n| `--db-password`     | PostgreSQL database password | None | no |\n| `--log-level`       | Log level (DEBUG, INFO, WARNING, ERROR or FATAL) | INFO | no |\n\nNote: You can also specify database connection settings via `DB_NAME`, `DB_HOST`, `DB_PORT`, `DB_USER` and `DB_PASSWORD` environment variables.\n\n#### Local installation\nClone this repository and run (assuming you have `python` 3.5+ and \n[pgfutter](https://github.com/lukasmartinelli/pgfutter) installed):\n```bash\npip install -r requirements.txt  # using pip\npipenv install --dev  # or using pipenv\n```\n\n#### Development\nIf you do not have `pipx` and `pipenv`, install with\n```bash\npython3 -m pip install --user pipx\npython3 -m pipx ensurepath\npipx install pipenv\n```\n\nInstall all dependencies with\n```bash\npipenv install --dev\n```\n\nTo format, sort imports and check PEP8 conformity, run\n```bash\npipenv run black .\npipenv run isort\npipenv run flake8\n```\n\nThese above checks are also configured as a git pre commit hook together with the test suite.\nBefore you commit, make sure to run `pre-commit run --all-files` to resolve any\nerrors in advance.\n\nAfter merging new changes, a new version is deployed to [pypi.org](https://pypi.org) when the version is tagged\nwith `bump2version (patch|minor|major)`.\n\n#### Testing\nThis project is not under active maintenance and not tested for production use.\nHowever, a small test suite is provided and can be run with:\n```bash\npython -m postgresimporter.tests.run_tests\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fromnn%2Fpostgresimporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fromnn%2Fpostgresimporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fromnn%2Fpostgresimporter/lists"}