{"id":13908122,"url":"https://github.com/robhowley/s3-streaming","last_synced_at":"2025-12-30T03:33:18.491Z","repository":{"id":34312805,"uuid":"175721653","full_name":"robhowley/s3-streaming","owner":"robhowley","description":"stream and (de)serialize s3 streams","archived":false,"fork":false,"pushed_at":"2022-03-15T17:40:02.000Z","size":12,"stargazers_count":15,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-16T17:31:55.692Z","etag":null,"topics":["aws","file-io","s3","stream-processing"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robhowley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-15T00:41:10.000Z","updated_at":"2024-02-26T10:47:43.000Z","dependencies_parsed_at":"2022-08-08T00:16:03.684Z","dependency_job_id":null,"html_url":"https://github.com/robhowley/s3-streaming","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/robhowley/s3-streaming","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robhowley%2Fs3-streaming","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robhowley%2Fs3-streaming/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robhowley%2Fs3-streaming/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robhowley%2Fs3-streaming/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robhowley","download_url":"https://codeload.github.com/robhowley/s3-streaming/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robhowley%2Fs3-streaming/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265710671,"owners_count":23815397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","file-io","s3","stream-processing"],"created_at":"2024-08-06T23:02:28.926Z","updated_at":"2025-12-30T03:33:18.453Z","avatar_url":"https://github.com/robhowley.png","language":"Python","funding_links":[],"categories":["HarmonyOS"],"sub_categories":["Windows Manager"],"readme":"[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n# s3-streaming: handling (big) S3 files like regular files\nStoring, retrieving and using files in S3 is a regular activity so it should be easy. It should also ...\n* stream the data\n* have an api that is python file-io like\n* handle some of the desearization and compression stuff because why not\n \n## Install\n\n```bash\npip install s3-streaming\n```\n\n## Streaming S3 objects like regular files\n\n### The basics\nOpening and reading S3 objects is similar to regular python io. The only difference is that you need to provide a \n`boto3.session.Session` instance to handle the bucket access. \n\n```python\nimport boto3\nfrom s3streaming import s3_open\n\n\nwith s3_open('s3://bucket/key', boto_session=boto3.session.Session()) as f:\n    for next_line in f:\n        print(next_line)\n```\n\n### Injecting deserialization and compression handling in stream\nConsider a file that is `gzip` compressed and contains lines of `json`. There's some boilerplate in dealing with that,\nbut why bother? Just handle that in stream.\n\n```python\nfrom s3streaming import s3_open, deserialize, compression\n\n\nreader_settings = dict(\n  boto_session=boto3.session.Session(),\n  deserializer=deserialize.json_lines, \n  compression=compression.gzip\n)\n\nwith s3_open('s3://bucket/key.gzip', **reader_settings) as f:\n    for next_line in f:\n        print(next_line.keys())    # because the file was decompressed ...\n        print(next_line.values())  #   ... and the json is now a loaded dict!\n\n```\n\nOther `deserialize` options include \n* `csv`\n* `csv_as_dict`\n* `tsv`\n* `tsv_as_dict`\n* `string`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobhowley%2Fs3-streaming","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobhowley%2Fs3-streaming","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobhowley%2Fs3-streaming/lists"}