https://github.com/medzin/beam-postgres
Light IO transforms for Postgres read/write in Apache Beam pipelines.
https://github.com/medzin/beam-postgres
apache-beam python
Last synced: about 1 year ago
JSON representation
Light IO transforms for Postgres read/write in Apache Beam pipelines.
- Host: GitHub
- URL: https://github.com/medzin/beam-postgres
- Owner: medzin
- License: apache-2.0
- Created: 2022-11-04T10:40:42.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-09T17:57:20.000Z (over 1 year ago)
- Last Synced: 2025-06-17T22:41:23.983Z (about 1 year ago)
- Topics: apache-beam, python
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 13
- Watchers: 2
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# beam-postgres
[][pypi-project]
[][pypi-project]
Light IO transforms for Postgres read/write in Apache Beam pipelines.
## Goal
The project aims to provide highly performant and customizable transforms and is
not intended to support many different SQL database engines.
## Features
- `ReadAllFromPostgres`, `ReadFromPostgres`` and `WriteToPostgres` transforms
- Records can be mapped to tuples, dictionaries or dataclasses
- Reads and writes are in configurable batches
## Usage
Printing data from the database table:
```python
import apache_beam as beam
from psycopg.rows import dict_row
from beam_postgres.io import ReadAllFromPostgres
with beam.Pipeline() as p:
data = p | "Reading example records from database" >> ReadAllFromPostgres(
"host=localhost dbname=examples user=postgres password=postgres",
"select id, data from source",
dict_row,
)
data | "Writing to stdout" >> beam.Map(print)
```
Writing data to the database table:
```python
from dataclasses import dataclass
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from beam_postgres.io import WriteToPostgres
@dataclass
class Example:
data: str
with beam.Pipeline(options=PipelineOptions()) as p:
data = p | "Reading example records" >> beam.Create(
[
Example("example1"),
Example("example2"),
]
)
data | "Writing example records to database" >> WriteToPostgres(
"host=localhost dbname=examples user=postgres password=postgres",
"insert into sink (data) values (%(data)s)",
)
```
See [here][examples] for more examples.
### Reading in batches
There may be situations when you have so much data that it will not fit into the
memory - then you want to read your table data in batches. You can see an
example code [here](examples/read.py#L11) (the code reads records in a batches of
1).
[pypi-project]: https://pypi.org/project/beam-postgres
[examples]: https://github.com/medzin/beam-postgres/tree/main/examples