{"id":15968895,"url":"https://github.com/beam-pyio/dynamodb_pyio","last_synced_at":"2026-01-04T17:06:47.530Z","repository":{"id":257680429,"uuid":"852622979","full_name":"beam-pyio/dynamodb_pyio","owner":"beam-pyio","description":"Apache Beam Python I/O connector for Amazon DynamoDB","archived":false,"fork":false,"pushed_at":"2024-09-21T09:22:00.000Z","size":2941,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-22T04:31:44.524Z","etag":null,"topics":["apache-beam","aws","data-engineering","data-streaming","dynamodb","python"],"latest_commit_sha":null,"homepage":"https://beam-pyio.github.io/dynamodb_pyio/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/beam-pyio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-05T06:08:24.000Z","updated_at":"2024-09-21T09:48:08.000Z","dependencies_parsed_at":"2024-10-30T04:02:31.016Z","dependency_job_id":null,"html_url":"https://github.com/beam-pyio/dynamodb_pyio","commit_stats":null,"previous_names":["beam-pyio/dynamodb_pyio"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Fdynamodb_pyio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Fdynamodb_pyio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Fdynamodb_pyio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Fdynamodb_pyio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/beam-pyio","download_url":"https://codeload.github.com/beam-pyio/dynamodb_pyio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245702592,"owners_count":20658642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","aws","data-engineering","data-streaming","dynamodb","python"],"created_at":"2024-10-07T19:04:27.542Z","updated_at":"2026-01-04T17:06:47.500Z","avatar_url":"https://github.com/beam-pyio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dynamodb_pyio\n\n![doc](https://github.com/beam-pyio/dynamodb_pyio/workflows/doc/badge.svg)\n![test](https://github.com/beam-pyio/dynamodb_pyio/workflows/test/badge.svg)\n[![release](https://img.shields.io/github/release/beam-pyio/dynamodb_pyio.svg)](https://github.com/beam-pyio/dynamodb_pyio/releases)\n![pypi](https://img.shields.io/pypi/v/dynamodb_pyio)\n![python](https://img.shields.io/pypi/pyversions/dynamodb_pyio)\n\n[Amazon DynamoDB](https://aws.amazon.com/dynamodb/) is a serverless, NoSQL database service that allows you to develop modern applications at any scale. The Apache Beam Python I/O connector for Amazon DynamoDB (`dynamodb_pyio`) aims to integrate with the database service by supporting source and sink connectors. Currently, the sink connector is available.\n\n## Installation\n\nThe connector can be installed from PyPI.\n\n```bash\n$ pip install dynamodb_pyio\n```\n\n## Usage\n\n### Sink Connector\n\nIt has the main composite transform ([`WriteToDynamoDB`](https://beam-pyio.github.io/dynamodb_pyio/autoapi/dynamodb_pyio/io/index.html#dynamodb_pyio.io.WriteToDynamoDB)), and it expects a list or tuple _PCollection_ element. If the element is a tuple, the tuple's first element is taken. If the element is not of the accepted types, you can apply the [`GroupIntoBatches`](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/) or [`BatchElements`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements) transform beforehand. Then, the records of the element are written to a DynamoDB table with help of the [`batch_writer`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/table/batch_writer.html) of the boto3 package. Note that the batch writer will automatically handle buffering and sending items in batches. In addition, it will also automatically handle any unprocessed items and resend them as needed.\n\nThe transform also has an option that handles duplicate records.\n\n- _dedup_pkeys_ - List of keys to be used for deduplicating items in buffer.\n\n#### Sink Connector Example\n\nThe transform can process many records, thanks to the _batch writer_.\n\n```python\nimport apache_beam as beam\nfrom dynamodb_pyio.io import WriteToDynamoDB\n\nrecords = [{\"pk\": str(i), \"sk\": i} for i in range(500)]\n\nwith beam.Pipeline() as p:\n    (\n        p\n        | beam.Create([records])\n        | WriteToDynamoDB(table_name=self.table_name)\n    )\n```\n\nDuplicate records can be handled using the _dedup_pkeys_ option.\n\n```python\nimport apache_beam as beam\nfrom dynamodb_pyio.io import WriteToDynamoDB\n\nrecords = [{\"pk\": str(1), \"sk\": 1} for _ in range(20)]\n\nwith beam.Pipeline() as p:\n    (\n        p\n        | beam.Create([records])\n        | WriteToDynamoDB(table_name=self.table_name, dedup_pkeys=[\"pk\", \"sk\"])\n    )\n```\n\nBatches of elements can be controlled further with the `BatchElements` or `GroupIntoBatches` transform\n\n```python\nimport apache_beam as beam\nfrom apache_beam.transforms.util import BatchElements\nfrom dynamodb_pyio.io import WriteToDynamoDB\n\nrecords = [{\"pk\": str(i), \"sk\": i} for i in range(100)]\n\nwith beam.Pipeline() as p:\n    (\n        p\n        | beam.Create(records)\n        | BatchElements(min_batch_size=50, max_batch_size=50)\n        | WriteToDynamoDB(table_name=self.table_name)\n    )\n```\n\nSee [Introduction to DynamoDB PyIO Sink Connector](https://beam-pyio.github.io/blog/2024/dynamodb-pyio-intro/) for more examples.\n\n## Contributing\n\nInterested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.\n\n## License\n\n`dynamodb_pyio` was created as part of the [Apache Beam Python I/O Connectors](https://github.com/beam-pyio) project. It is licensed under the terms of the Apache License 2.0 license.\n\n## Credits\n\n`dynamodb_pyio` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `pyio-cookiecutter` [template](https://github.com/beam-pyio/pyio-cookiecutter).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeam-pyio%2Fdynamodb_pyio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeam-pyio%2Fdynamodb_pyio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeam-pyio%2Fdynamodb_pyio/lists"}