{"id":39280465,"url":"https://github.com/borys25ol/pg2pyrquet","last_synced_at":"2026-01-18T01:01:15.985Z","repository":{"id":254422329,"uuid":"846483425","full_name":"borys25ol/pg2pyrquet","owner":"borys25ol","description":"Python CLI tool designed to export PostgreSQL tables into Parquet files","archived":false,"fork":false,"pushed_at":"2024-08-27T20:21:31.000Z","size":71,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-08-28T21:25:37.739Z","etag":null,"topics":["cli","dumps","exports","parquet","postgresql","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/borys25ol.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-23T09:59:17.000Z","updated_at":"2024-08-27T20:21:35.000Z","dependencies_parsed_at":"2024-08-25T15:22:17.927Z","dependency_job_id":null,"html_url":"https://github.com/borys25ol/pg2pyrquet","commit_stats":null,"previous_names":["borys25ol/pg2pyrquet"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/borys25ol/pg2pyrquet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borys25ol%2Fpg2pyrquet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borys25ol%2Fpg2pyrquet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borys25ol%2Fpg2pyrquet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borys25ol%2Fpg2pyrquet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/borys25ol","download_url":"https://codeload.github.com/borys25ol/pg2pyrquet/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borys25ol%2Fpg2pyrquet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28525424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"ssl_error","status_checked_at":"2026-01-18T00:39:39.467Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","dumps","exports","parquet","postgresql","python"],"created_at":"2026-01-18T01:00:45.413Z","updated_at":"2026-01-18T01:01:15.973Z","avatar_url":"https://github.com/borys25ol.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Postgres Exports to Apache Parquet with Python (pg2pyrquet)\n====================\n\n[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![Pre-commit: enabled](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white\u0026style=flat)](https://github.com/pre-commit/pre-commit)\n\n## Description\n\n`pg2pyrquet` is a Python CLI tool designed to export PostgreSQL tables into Parquet files.\nThis tool is particularly useful for data engineers and analysts who want to efficiently convert PostgreSQL data into a columnar storage format that is optimal for analytical workloads.\n\n\n## Features\n\n- **Efficient Data Export**: Export PostgreSQL tables directly to Parquet files.\n- **Batch Processing**: Specify batch size to handle large datasets efficiently.\n- **Customizable Output**: Define output folder and file name for the Parquet file.\n\n\n## Installation\n\nSetup and activate a python3 virtualenv via your preferred method. e.g. and install production requirements:\n\n```shell\nmake ve\n```\n\n\nTo use `pg2pyrquet`, you need to have Python installed. You can install the necessary dependencies using `pip`:\n\n```sh\npip install -r requirements.txt\n```\n\nConfiguration\n--------------\n\nIf your database has a password, you can set the `POSTGRES_USER` and `POSTGRES_PASSWORD` environment variables to avoid entering them every time you run the tool.\nFor security reasons, it is recommended to use environment variables to store sensitive information.\n\n```shell\nexport POSTGRES_USER=\u003cuser\u003e\nexport POSTGRES_PASSWORD=\u003cpassword\u003e\n````\n\nOr using more secure way:\n\n```shell\nread -s -p \"Enter POSTGRES User: \" POSTGRES_USER || export POSTGRES_USER\n\nread -s -p \"Enter POSTGRES Password: \" POSTGRES_PASSWORD || export POSTGRES_PASSWORD\n```\n\nUsage\n-----\n\nThe primary command provided by this tool is `export-table`, `export-database` and `export-query` commands,\nwhich allows you to export a PostgreSQL table or all PostgreSQL database tables to a Parquet files.\n\n### Export a Single Table\n\nTo export a single table from a PostgreSQL database to a Parquet file, use the `export-table` command.\nThis command allows you to specify the database, table, output folder, output file name, and the batch size for processing.\n\n#### Example Command:\n\n```shell\npython -m pg2pyrquet export-table \\\n    --host \u003chost\u003e \\\n    --port \u003cport\u003e \\\n    --database \u003cdatabase_name\u003e \\\n    --table \u003ctable_name\u003e \\\n    --folder \u003coutput_folder\u003e \\\n    --output-file \u003coutput_filename\u003e \\\n    --batch-size \u003cbatch_size\u003e\n```\n\n#### Command Options\n- `--host`: The hostname of the PostgreSQL server.\n- `--port`: The port number of the PostgreSQL server.\n- `--database`: The name of the PostgreSQL database you want to export data from.\n- `--table`: The specific table within the database to export.\n- `--folder`: The directory where the Parquet file will be saved.\n- `--output-file`: The name of the output Parquet file.\n- `--batch-size`: The number of rows to process in each batch. This helps in managing memory usage for large tables.\n\n### Export All Database Tables\n\nTo export all tables from a PostgreSQL database to Parquet files, use the `export-database` command.\nThis command exports each table into a separate Parquet file in the specified output folder.\n\n#### Example Command\n\n```shell\npython -m pg2pyrquet export-database \\\n    --host \u003chost\u003e \\\n    --port \u003cport\u003e \\\n    --database \u003cdatabase_name\u003e \\\n    --folder \u003coutput_folder\u003e \\\n    --batch-size \u003cbatch_size\u003e\n```\n\n#### Command Options\n\n- `--host`: The hostname of the PostgreSQL server.\n- `--port`: The port number of the PostgreSQL server.\n- `--database`: The name of the PostgreSQL database you want to export data from.\n- `--folder`: The directory where the Parquet file will be saved.\n- `--batch-size`: The number of rows to process in each batch. This helps in managing memory usage for large tables.\n\n\n#### Note on File Naming\nWhen using the `export-database` command, each Parquet file will be named according to the table name, following the format `{table_name}.parquet`.\n\n\n### Export a Custom Query\n\nTo export the result of a custom SQL query from a PostgreSQL database to a Parquet file, use the `export_query` command.\nThis command allows you to specify the database, the file containing the SQL query, the output folder, the output file name, and the batch size for processing.\n\n#### Example Command:\n\n```shell\npython -m pg2pyrquet export-query \\\n    --host \u003chost\u003e \\\n    --port \u003cport\u003e \\\n    --database \u003cdatabase_name\u003e \\\n    --query-file \u003cquery_file_path\u003e \\\n    --folder \u003coutput_folder\u003e \\\n    --output-file \u003coutput_filename\u003e \\\n    --batch-size \u003cbatch_size\u003e\n```\n\n#### Command Options\n\n- `--host`: The hostname of the PostgreSQL server.\n- `--port`: The port number of the PostgreSQL server.\n- `--database`: The name of the PostgreSQL database you want to export data from.\n- `--query-file`: The path to the file containing the SQL query (like `custom-query.sql`).\n- `--folder`: The directory where the Parquet file will be saved.\n- `--output-file`: The name of the output Parquet file.\n- `--batch-size`: The number of rows to process in each batch. This helps in managing memory usage for large tables.\n\nExample SQL query file (`custom-query.sql`):\n\n```sql\nSELECT *\nFROM my_table\nWHERE column_name = 'value'\nGROUP BY column_name\nLIMIT 1000;\n```\n\n### Running from Python\n\nAlso, you have the ability execute all available commands as Python functions:\n\nCreate a new file `export.py` with the following content:\n\n```python\nimport os\n\nfrom pg2pyrquet import export_database, export_query, export_table\n\n\ndef set_postgres_auth_env(username: str, password: str) -\u003e None:\n    \"\"\"\n    Set the PostgreSQL username and password as environment variables\n\n    Args:\n        username (str): The username for the PostgreSQL database.\n        password (str): The password for the PostgreSQL database.\n    \"\"\"\n    os.environ[\"POSTGRES_USER\"] = username\n    os.environ[\"POSTGRES_PASSWORD\"] = password\n\n\ndef run_export_database() -\u003e None:\n    export_database(\n        host=\"localhost\",\n        port=\"5432\",\n        database=\"test_database\",\n        output_path=\"./data\",\n        batch_size=5000,\n    )\n\n\ndef run_export_table() -\u003e None:\n    export_table(\n        host=\"localhost\",\n        port=\"5432\",\n        database=\"test_database\",\n        table=\"test_table\",\n        output_path=\"./data\",\n        output_file=\"test_table.parquet\",\n        batch_size=5000,\n    )\n\n\ndef run_export_query() -\u003e None:\n    export_query(\n        host=\"localhost\",\n        port=\"5432\",\n        database=\"test_database\",\n        query_file=\"./custom_query.sql\",\n        output_path=\"./data\",\n        output_file=\"query.parquet\",\n        batch_size=5000,\n    )\n\n\ndef main() -\u003e None:\n    # If your database has a password,\n    # you can set the POSTGRES_USER and POSTGRES_PASSWORD environment variables.\n    set_postgres_auth_env(username=\"username\", password=\"password\")\n\n    run_export_database()\n    run_export_table()\n    run_export_query()\n\n\nif __name__ == \"__main__\":\n    main()\n\n```\n\nAnd run it:\n\n```shell\npython export.py\n```\n\nContributing\n------------\nContributions are welcome!\nIf you find a bug or have a feature request, please open an issue or submit a pull request on GitHub.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborys25ol%2Fpg2pyrquet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fborys25ol%2Fpg2pyrquet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborys25ol%2Fpg2pyrquet/lists"}