Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/robhowley/s3-streaming
stream and (de)serialize s3 streams
https://github.com/robhowley/s3-streaming
aws file-io s3 stream-processing
Last synced: 3 months ago
JSON representation
stream and (de)serialize s3 streams
- Host: GitHub
- URL: https://github.com/robhowley/s3-streaming
- Owner: robhowley
- License: mit
- Created: 2019-03-15T00:41:10.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-03-15T17:40:02.000Z (almost 3 years ago)
- Last Synced: 2024-04-24T16:38:23.860Z (10 months ago)
- Topics: aws, file-io, s3, stream-processing
- Language: Python
- Size: 11.7 KB
- Stars: 15
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://opensource.org/licenses/MIT)
# s3-streaming: handling (big) S3 files like regular files
Storing, retrieving and using files in S3 is a regular activity so it should be easy. It should also ...
* stream the data
* have an api that is python file-io like
* handle some of the desearization and compression stuff because why not
## Install```bash
pip install s3-streaming
```## Streaming S3 objects like regular files
### The basics
Opening and reading S3 objects is similar to regular python io. The only difference is that you need to provide a
`boto3.session.Session` instance to handle the bucket access.```python
import boto3
from s3streaming import s3_openwith s3_open('s3://bucket/key', boto_session=boto3.session.Session()) as f:
for next_line in f:
print(next_line)
```### Injecting deserialization and compression handling in stream
Consider a file that is `gzip` compressed and contains lines of `json`. There's some boilerplate in dealing with that,
but why bother? Just handle that in stream.```python
from s3streaming import s3_open, deserialize, compressionreader_settings = dict(
boto_session=boto3.session.Session(),
deserializer=deserialize.json_lines,
compression=compression.gzip
)with s3_open('s3://bucket/key.gzip', **reader_settings) as f:
for next_line in f:
print(next_line.keys()) # because the file was decompressed ...
print(next_line.values()) # ... and the json is now a loaded dict!```
Other `deserialize` options include
* `csv`
* `csv_as_dict`
* `tsv`
* `tsv_as_dict`
* `string`