{"id":26471100,"url":"https://github.com/medzin/beam-postgres","last_synced_at":"2025-06-24T10:10:05.504Z","repository":{"id":63204135,"uuid":"561721007","full_name":"medzin/beam-postgres","owner":"medzin","description":"Light IO transforms for Postgres read/write in Apache Beam pipelines.","archived":false,"fork":false,"pushed_at":"2025-02-09T17:57:20.000Z","size":48,"stargazers_count":13,"open_issues_count":2,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-17T22:41:23.983Z","etag":null,"topics":["apache-beam","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/medzin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-04T10:40:42.000Z","updated_at":"2025-05-19T03:07:52.000Z","dependencies_parsed_at":"2024-11-15T10:57:49.508Z","dependency_job_id":"e382eb9b-a5a8-4d4b-a8b9-f55b4f8b080c","html_url":"https://github.com/medzin/beam-postgres","commit_stats":{"total_commits":30,"total_committers":1,"mean_commits":30.0,"dds":0.0,"last_synced_commit":"17bce07adbedd13ec1d2e016529c16af20993daf"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/medzin/beam-postgres","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medzin%2Fbeam-postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medzin%2Fbeam-postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medzin%2Fbeam-postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medzin%2Fbeam-postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/medzin","download_url":"https://codeload.github.com/medzin/beam-postgres/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medzin%2Fbeam-postgres/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261649858,"owners_count":23189755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","python"],"created_at":"2025-03-19T20:12:19.109Z","updated_at":"2025-06-24T10:10:05.479Z","avatar_url":"https://github.com/medzin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# beam-postgres\n\n[![PyPI](https://img.shields.io/pypi/v/beam-postgres.svg)][pypi-project]\n[![Supported Versions](https://img.shields.io/pypi/pyversions/beam-postgres.svg)][pypi-project]\n\nLight IO transforms for Postgres read/write in Apache Beam pipelines.\n\n## Goal\n\nThe project aims to provide highly performant and customizable transforms and is\nnot intended to support many different SQL database engines.\n\n## Features\n\n- `ReadAllFromPostgres`, `ReadFromPostgres`` and `WriteToPostgres` transforms\n- Records can be mapped to tuples, dictionaries or dataclasses\n- Reads and writes are in configurable batches\n\n## Usage\n\nPrinting data from the database table:\n\n```python\nimport apache_beam as beam\nfrom psycopg.rows import dict_row\n\nfrom beam_postgres.io import ReadAllFromPostgres\n\nwith beam.Pipeline() as p:\n    data = p | \"Reading example records from database\" \u003e\u003e ReadAllFromPostgres(\n        \"host=localhost dbname=examples user=postgres password=postgres\",\n        \"select id, data from source\",\n        dict_row,\n    )\n    data | \"Writing to stdout\" \u003e\u003e beam.Map(print)\n\n```\n\nWriting data to the database table:\n\n```python\nfrom dataclasses import dataclass\n\nimport apache_beam as beam\nfrom apache_beam.options.pipeline_options import PipelineOptions\n\nfrom beam_postgres.io import WriteToPostgres\n\n\n@dataclass\nclass Example:\n    data: str\n\n\nwith beam.Pipeline(options=PipelineOptions()) as p:\n    data = p | \"Reading example records\" \u003e\u003e beam.Create(\n        [\n            Example(\"example1\"),\n            Example(\"example2\"),\n        ]\n    )\n    data | \"Writing example records to database\" \u003e\u003e WriteToPostgres(\n        \"host=localhost dbname=examples user=postgres password=postgres\",\n        \"insert into sink (data) values (%(data)s)\",\n    )\n\n```\n\nSee [here][examples] for more examples.\n\n### Reading in batches\n\nThere may be situations when you have so much data that it will not fit into the\nmemory - then you want to read your table data in batches. You can see an\nexample code [here](examples/read.py#L11) (the code reads records in a batches of\n1).\n\n[pypi-project]: https://pypi.org/project/beam-postgres\n[examples]: https://github.com/medzin/beam-postgres/tree/main/examples\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedzin%2Fbeam-postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmedzin%2Fbeam-postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedzin%2Fbeam-postgres/lists"}