{"id":20059123,"url":"https://github.com/beam-pyio/firehose_pyio","last_synced_at":"2025-05-05T15:31:06.637Z","repository":{"id":243752685,"uuid":"813351888","full_name":"beam-pyio/firehose_pyio","owner":"beam-pyio","description":"Apache Beam Python I/O connector for Amazon Data Firehose","archived":false,"fork":false,"pushed_at":"2024-09-20T00:06:23.000Z","size":2972,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-02T16:26:06.504Z","etag":null,"topics":["apache-beam","aws","data-engineering","data-streaming","firehose","python"],"latest_commit_sha":null,"homepage":"https://beam-pyio.github.io/firehose_pyio/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/beam-pyio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-10T23:20:50.000Z","updated_at":"2024-09-21T09:48:41.000Z","dependencies_parsed_at":"2024-06-27T07:34:30.322Z","dependency_job_id":"30e95256-6ccf-4e7d-b1ea-0886c8292a94","html_url":"https://github.com/beam-pyio/firehose_pyio","commit_stats":null,"previous_names":["beam-pyio/firehose_pyio"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Ffirehose_pyio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Ffirehose_pyio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Ffirehose_pyio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beam-pyio%2Ffirehose_pyio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/beam-pyio","download_url":"https://codeload.github.com/beam-pyio/firehose_pyio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224452831,"owners_count":17313668,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","aws","data-engineering","data-streaming","firehose","python"],"created_at":"2024-11-13T13:06:07.802Z","updated_at":"2024-11-13T13:06:08.364Z","avatar_url":"https://github.com/beam-pyio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# firehose_pyio\n\n![doc](https://github.com/beam-pyio/firehose_pyio/workflows/doc/badge.svg)\n![test](https://github.com/beam-pyio/firehose_pyio/workflows/test/badge.svg)\n[![release](https://img.shields.io/github/release/beam-pyio/firehose_pyio.svg)](https://github.com/beam-pyio/firehose_pyio/releases)\n![pypi](https://img.shields.io/pypi/v/firehose_pyio)\n![python](https://img.shields.io/pypi/pyversions/firehose_pyio)\n\n[Amazon Data Firehose](https://aws.amazon.com/firehose/) is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon OpenSearch Service and Amazon OpenSearch Serverless. The Apache Beam Python I/O connector for Amazon Data Firehose (`firehose_pyio`) provides a data sink feature that facilitates integration with those services.\n\n## Installation\n\n```bash\n$ pip install firehose_pyio\n```\n\n## Usage\n\nThe connector has the main composite transform ([`WriteToFirehose`](https://beam-pyio.github.io/firehose_pyio/autoapi/firehose_pyio/io/index.html#firehose_pyio.io.WriteToFirehose)), and it expects a list or tuple _PCollection_ element. If the element is a tuple, the tuple's first element is taken. If the element is not of the accepted types, you can apply the [`GroupIntoBatches`](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/) or [`BatchElements`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements) transform beforehand. Then, the records of the element are sent into a Firehose delivery stream using the [`put_record_batch`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/firehose/client/put_record_batch.html) method of the boto3 package. Note that the above batch transforms can also be useful to overcome the API limitation listed below.\n\n- Each `PutRecordBatch` request supports up to 500 records. Each record in the request can be as large as 1,000 KB (before base64 encoding), up to a limit of 4 MB for the entire request. These limits cannot be changed.\n\nThe transform also has options that control individual records as well as handle failed records.\n\n- _jsonify_ - A flag that indicates whether to convert a record into JSON. Note that a record should be of _bytes_, _bytearray_ or file-like object, and, if it is not of a supported type (e.g. integer), we can convert it into a Json string by specifying this flag to _True_.\n- _multiline_ - A flag that indicates whether to add a new line character (`\\n`) to each record. It is useful to save records into a _CSV_ or _Jsonline_ file.\n- _max_trials_ - The maximum number of trials when there is one or more failed records - it defaults to 3. Note that failed records after all trials are returned, which allows users to determine how to handle them subsequently.\n- _append_error_ - Whether to append error details to failed records. Defaults to True.\n\nAs mentioned earlier, failed elements are returned by a tagged output where it is named as `write-to-firehose-failed-output` by default. You can change the name by specifying a different name using the `failed_output` argument.\n\n### Example\n\nIf a _PCollection_ element is key-value pair (i.e. keyed stream), it can be batched in group using the `GroupIntoBatches` transform before it is connected into the main transform.\n\n```python\nimport apache_beam as beam\nfrom apache_beam import GroupIntoBatches\nfrom firehose_pyio.io import WriteToFirehose\n\nwith beam.Pipeline(options=pipeline_options) as p:\n    (\n        p\n        | beam.Create([(1, \"one\"), (2, \"three\"), (1, \"two\"), (2, \"four\")])\n        | GroupIntoBatches(batch_size=2)\n        | WriteToFirehose(\n            delivery_stream_name=delivery_stream_name,\n            jsonify=True,\n            multiline=True,\n            max_trials=3\n        )\n    )\n```\n\nFor a list element (i.e. unkeyed stream), we can apply the `BatchElements` transform instead.\n\n```python\nimport apache_beam as beam\nfrom apache_beam.transforms.util import BatchElements\nfrom firehose_pyio.io import WriteToFirehose\n\nwith beam.Pipeline(options=pipeline_options) as p:\n    (\n        p\n        | beam.Create([\"one\", \"two\", \"three\", \"four\"])\n        | BatchElements(min_batch_size=2, max_batch_size=2)\n        | WriteToFirehose(\n            delivery_stream_name=delivery_stream_name,\n            jsonify=True,\n            multiline=True,\n            max_trials=3\n        )\n    )\n```\n\nSee [Introduction to Firehose PyIO Sink Connector](/blog/2024/firehose-pyio-intro/) for more examples.\n\n## Contributing\n\nInterested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.\n\n## License\n\n`firehose_pyio` was created as part of the [Apache Beam Python I/O Connectors](https://github.com/beam-pyio) project. It is licensed under the terms of the Apache License 2.0 license.\n\n## Credits\n\n`firehose_pyio` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `pyio-cookiecutter` [template](https://github.com/beam-pyio/pyio-cookiecutter).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeam-pyio%2Ffirehose_pyio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeam-pyio%2Ffirehose_pyio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeam-pyio%2Ffirehose_pyio/lists"}