https://github.com/e2fyi/py-utils

Useful ease-of-use python helpers to interact with AWS resources.
https://github.com/e2fyi/py-utils
aws boto3 s3 s3-bucket s3-buckets upload
Last synced: 8 months ago
JSON representation
Useful ease-of-use python helpers to interact with AWS resources.
Host: GitHub
URL: https://github.com/e2fyi/py-utils
Owner: e2fyi
License: apache-2.0
Created: 2019-10-23T10:11:43.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2022-12-08T03:14:07.000Z (almost 3 years ago)
Last Synced: 2025-03-20T18:18:18.651Z (9 months ago)
Topics: aws, boto3, s3, s3-bucket, s3-buckets, upload
Language: Python
Size: 320 KB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 8
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

          # e2fyi-utils

[![PyPI version](https://badge.fury.io/py/e2fyi-utils.svg)](https://badge.fury.io/py/e2fyi-utils)

[![Build Status](https://travis-ci.com/e2fyi/py-utils.svg?branch=master)](https://travis-ci.com/github/e2fyi/py-utils)

[![Coverage Status](https://coveralls.io/repos/github/e2fyi/py-utils/badge.svg?branch=master)](https://coveralls.io/github/e2fyi/py-utils?branch=master)

[![Documentation Status](https://readthedocs.org/projects/e2fyi-utils/badge/?version=latest)](https://e2fyi-utils.readthedocs.io/en/latest/?badge=latest)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Downloads](https://pepy.tech/badge/e2fyi-utils/month)](https://pepy.tech/project/e2fyi-utils/month)

[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fe2fyi%2Fpy-utils.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fe2fyi%2Fpy-utils?ref=badge_shield)

`e2fyi-utils` is an `e2fyi` namespaced python package with `utils` subpackage

(i.e. `e2fyi.utils`) which holds a collections of useful helper classes and

functions to interact with various cloud resources.

API documentation can be found at [https://e2fyi-utils.readthedocs.io/en/latest/](https://e2fyi-utils.readthedocs.io/en/latest/).

Change logs are available in [CHANGELOG.md](./CHANGELOG.md).

> - Python 3.6 and above

> - Licensed under [Apache-2.0](./LICENSE).

## Quickstart

```bash

# install the default packages only (e.g. no pandas)

pip install e2fyi-utils[all]

# install all optional packages

pip install e2fyi-utils[all]

# install specific optional packages

pip install e2fyi-utils[pandas]

```

Available optional packages:

- `pandas`

### S3Stream

`S3Stream` represents the data stream of a S3 resource, and provides static

methods to convert any python objects into a stream. This is generally used with

`S3Resource` to upload or download resource from S3 buckets.

> NOTE:

> - `str`, `float`, `int`, and `bool` will be saved as txt files with mime type "text/plain".

> - `pydantic` models, `dict` or `list` will be saved as json files with mime type "application/json" (fallback to pickle if unable to serialize into json string).

> - `pandas` dataframe or series can be saved as either a csv ("application/csv") or json format ("application/json").

> - path to files will be read with `open` and mime type will be inferred (fallback to "application/octet-stream").

> - all other python objects will be pickled with `joblib`.

```py

import io

import pandas as pd

from e2fyi.utils.aws import S3Stream

from pydantic import BaseModel

# create a s3 stream

stream = S3Stream(io.StringIO("random text"), "text/plain")

print(stream.read())        # prints "random text"

print(stream.content_type)  # prints "text/plain"

# string

stream = S3Stream.from_any("hello world")

print(stream.read())        # prints "hello world"

print(stream.content_type)  # prints "text/plain"

# dict

stream = S3Stream.from_any({"foo": "bar"})

print(stream.read())        # prints "{"foo": "bar"}"

print(stream.content_type)  # prints "application/json"

# pandas dataframe as csv

df = pd.DataFrame([{"key": "a", "value": 1}, {"key": "b", "value": 2}])

stream = S3Stream.from_any(df, index=False)  # do not include index column

print(stream.read())        # prints string as csv format for the dataframe

print(stream.content_type)  # prints "application/csv"

# pandas dataframe as json

stream = S3Stream.from_any(df, orient="records")  # orient dataframe as records

print(stream.read())        # prints string as json list for the dataframe

print(stream.content_type)  # prints "application/json"

# pydantic model

class Person(BaseModel):

    name: str

    age: int

stream = S3Stream.from_any(Person(name="william", age=21))

print(stream.read())        # prints "{"name": "william", "age": 21}"

print(stream.content_type)  # prints "application/json"

# any other python objects

class Pet:

    name: str

    age: int

stream = S3Stream.from_any(Pet(name="kopi", age=1))

print(stream.read())        # prints some binary bytes

print(stream.content_type)  # prints "application/octet-stream"

```

### S3Resource

`S3Resource` represents a resource in S3 currently or a local resource that will

be uploaded to S3. `S3Resource` constructor will automatically attempts to convert

any inputs into a `S3Stream`, but for more granular control `S3Stream.from_any`

should be used instead to create the `S3Stream`.

`S3Resource` is a readable stream - i.e. it has `read`, `seek`, and `close`.

> NOTE:

>

> See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html

> for additional keyword arguments that can be passed to S3Resource.

#### Example: Creating S3Resource from local python object or file.

```py

import boto3

from e2fyi.utils.aws import S3Resource, S3Stream

# create custom s3 client

s3client = boto3.client(

    's3',

    aws_access_key_id=ACCESS_KEY,

    aws_secret_access_key=SECRET_KEY

)

# creates a local copy of s3 resource with S3Stream from a local file

obj = S3Resource(

    # full path shld be "prefix/some_file.json"

    filename="some_file.json",

    prefix="prefix/",

    # bucket to download from or upload to

    bucketname="some_bucket",

    # or "s3n://" or "s3://"

    protocol="s3a://",

    # uses default client if not provided

    s3client=s3client,

    # attempts to convert to S3Stream if input is not a S3Stream

    stream=S3Stream.from_file("./some_path/some_file.json"),

    # addition kwarg to pass to `s3.upload_fileobj` or `s3.download_fileobj` methods

    Metadata={"label": "foo"}

)

print(obj.key)  # prints "prefix/some_file.json"

print(obj.uri)  # prints "s3a://some_bucket/prefix/some_file.json"

# will attempt to fix prefix and filename if incorrect filename is provided

obj = S3Resource(

    filename="subfolder/some_file.json",

    prefix="prefix/"

)

print(obj.filename)     # prints "some_file.json"

print(obj.prefix)       # prints "prefix/subfolder/"

```

#### Example: Upload S3Resource to S3

```py

# creates a local copy of s3 resource with some python object

obj = S3Resource(

    filename="some_file.txt",

    prefix="prefix/",

    bucketname="some_bucket",

    stream={"some": "dict"},

)

# upload obj to s3 bucket "some_bucket" with the key "prefix/some_file.json"

# with the json string content.

obj.save()

# upload to s3 bucket "another_bucket" instead with a metadata tag.

obj.save("another_bucket", MetaData={"label": "foo"})

```

#### Example: Read S3Resource from S3

```py

from pydantic import BaseModel

# do not provide a stream input to the S3Resource constructor

obj = S3Resource(

    filename="some_file.json",

    prefix="prefix/",

    bucketname="some_bucket",

    content_type="application/json"

)

# read the resource like a normal file object from S3

data = obj.read()

print(type(data))       # prints 

# read and load json string into a dict or list

# for content_type == "application/json" only

data_obj = obj.load()

print(type(data_obj))   # prints  or 

# read and convert into a pydantic model

class Person(BaseModel):

    name: str

    age: int

# automatically unpack the dict

data_obj = obj.load(lambda name, age: Person(name=name, age=age))

# alternatively, do not unpack

data_obj = obj.load(lambda data: Person(**data), unpack=False)

print(type(data_obj))   # prints 

```

### S3Bucket

`S3Bucket` is an abstraction of the actual S3 bucket with methods to interact

with the actual S3 bucket (e.g. list objects inside the bucket), and some utility

methods.

Prefix rules can also be set during the creation of the `S3Bucket` object - i.e.

enforce a particular prefix rules for a particular bucket.

#### Quickstart

```py

from e2fyi.utils.aws import S3Bucket

# prints key for all resources with prefix "some_folder/"

for resource in S3Bucket("some_bucket").list("some_folder/"):

    print(resource.key)

# prints key for the first 2,000 resources with prefix "some_folder/"

for resource in S3Bucket("some_bucket").list("some_folder/", max_objects=2000):

    print(resource.key)

# creates a s3 bucket with prefix rule

prj_bucket = S3Bucket("some_bucket", get_prefix=lambda prefix: "prj-a/%s" % prefix)

for resource in prj_bucket.list("some_folder/"):

    print(resource.key)  # prints "prj-a/some_folder/"

    print(resource.stats)  # prints metadata for the resource (e.g. size, etag)

# get obj key in the bucket

print(prj_bucket.create_resource_key("foo.json"))  # prints "prj-a/foo.json"

# get obj uri in the bucket

# prints "s3a://some_bucket/prj-a/foo.json"

print(prj_bucket.create_resource_uri("foo.json", "s3a://"))

# create S3Resource in bucket to read in

foo = prj_bucket.create_resource("foo.json", "application/json")

# read "s3a://some_bucket/prj-a/foo.json" and load as a dict (or list)

foo_dict = foo.load()

# create S3Resource in bucket and save to "s3a://some_bucket/prj-a/foo.json"

prj_bucket.create_resource("foo.json", obj={"foo": "bar"}).save()

```

### e2fyi.utils.core.Maybe

`Maybe` represents an uncertain object (exception might be raised so no value

will be returned). This is generally used inside a function where all exceptions

will be caught.

> NOTE:

> - `Maybe.value` is the actual returned value.

> - `Maybe.exception` is the exception caught (if any).

> - `Maybe.with_default` method can be used to provide a default value if no value

is returned.

> - `Maybe.is_ok` method can be used to check if any value is returned.

```py

import logging

from e2fyi.utils.core import Maybe

def load_from_file(filepath: str) -> Maybe[string]:

    """loads the content of a file."""

    try:

        with open(filepath, "r") as fp:

            return Maybe(fp.read())

    except IOError as err:

        return Maybe(exception=err)

data = load_from_file("some_file.json")

# print with a default value fallback

print(data.with_default("default value"))

# print data if ok, else log exception

if data.is_ok:

    print(data)

else:

    logging.exception(data.exception)

```

## License

[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fe2fyi%2Fpy-utils.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fe2fyi%2Fpy-utils?ref=badge_large)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/e2fyi/py-utils

Awesome Lists containing this project

README