https://github.com/drgfreeman/dynamo-pandas

Make working with pandas data and AWS DynamoDB easy
https://github.com/drgfreeman/dynamo-pandas

aws aws-dynamodb boto3 database dataframe deserialization dynamo-pandas dynamodb interface pandas serialization

Last synced: 3 months ago
JSON representation

Make working with pandas data and AWS DynamoDB easy

Host: GitHub
URL: https://github.com/drgfreeman/dynamo-pandas
Owner: DrGFreeman
License: mit
Created: 2021-03-07T03:44:03.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-01-26T20:27:10.000Z (5 months ago)
Last Synced: 2025-03-31T06:09:17.750Z (3 months ago)
Topics: aws, aws-dynamodb, boto3, database, dataframe, deserialization, dynamo-pandas, dynamodb, interface, pandas, serialization
Language: Python
Homepage: https://dynamo-pandas.readthedocs.io/en/stable/
Size: 168 KB
Stars: 21
Watchers: 3
Forks: 6
Open Issues: 4
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

        [![unit-tests-linux](https://github.com/drgfreeman/dynamo-pandas/actions/workflows/checks.yml/badge.svg)](https://github.com/DrGFreeman/dynamo-pandas/actions/workflows/checks.yml)

[![Documentation Status](https://readthedocs.org/projects/dynamo-pandas/badge/?version=latest)](https://dynamo-pandas.readthedocs.io/en/latest/?badge=latest)

# dynamo-pandas

Make working with pandas data and AWS DynamoDB easy.

## Motivation

This package aims a making the transfer of data between pandas dataframes and DynamoDB as simple as possible. To meet this goal, the package offers two key features:

1. Automatic conversion of pandas data types to DynamoDB supported data types.

1. A simple, high level interface to *put* data from a dataframe into a DynamoDB table and *get* all or selected items from a table into a dataframe.

## Documentation

The project's documentation is available at https://dynamo-pandas.readthedocs.io/.

## Requirements

* `python>=3.9`

* `pandas>=1.2`

* `boto3`

## Installation

```

python -m pip install dynamo-pandas

```

This will install the package and its dependencies except for `boto3` which is not installed by default to avoid unnecessary installation when building Lambda layers.

To include `boto3` as part of the installation, add the `boto3` "extra" this way:

```

python -m pip install dynamo-pandas[boto3]

```

## Example Usage

Consider the pandas DataFrame below.

```python

>>> print(players_df)

      player_id           last_play       play_time  rating  bonus_points

0    player_one 2021-01-18 22:47:23 2 days 17:41:55     4.3             3

1    player_two 2021-01-19 19:07:54 0 days 22:07:34     3.8             1

2  player_three 2021-01-21 10:22:43 1 days 14:01:19     2.5             4

3   player_four 2021-01-22 13:51:12 0 days 03:45:49     4.8          

```

The columns of the dataframe use different data types, some of which are not natively supported by DynamoDB, like numpy.datetime64, timedelta64 and pandas' nullable integers.

```python

>>> players_df.info()

RangeIndex: 4 entries, 0 to 3

Data columns (total 5 columns):

    #   Column        Non-Null Count  Dtype          

   ---  ------        --------------  -----          

    0   player_id     4 non-null      object         

    1   last_play     4 non-null      datetime64[ns] 

    2   play_time     4 non-null      timedelta64[ns]

    3   rating        4 non-null      float64        

    4   bonus_points  3 non-null      Int8           

dtypes: Int8(1), datetime64[ns](1), float64(1), object(1), timedelta64[ns](1)

memory usage: 264.0+ bytes

```

Storing the rows of this dataframe to DynamoDB requires multiple data type conversions.

```python

>>> from dynamo_pandas import put_df, get_df, keys

```

The `put_df` function adds or updates the rows of a dataframe into the specified table, taking care of the required type conversions (the table must be already created and the primary key column(s) be present in the dataframe).

```python

>>> put_df(players_df, table="players")

```

The `get_df` function retrieves the items matching the speficied key(s) from the table into a dataframe.

```python

>>> df = get_df(table="players", keys=[{"player_id": "player_three"}, {"player_id": "player_one"}])

>>> print(df)

   bonus_points     player_id            last_play  rating        play_time

0             4  player_three  2021-01-21 10:22:43     2.5  1 days 14:01:19

1             3    player_one  2021-01-18 22:47:23     4.3  2 days 17:41:55

```

In the case where only a partition key is used, the `keys` function simplifies the generation of the keys list.

```python

>>> df = get_df(table="players", keys=keys(player_id=["player_two", "player_four"]))

>>> print(df)

   bonus_points    player_id            last_play  rating        play_time

0           1.0   player_two  2021-01-19 19:07:54     3.8  0 days 22:07:34

1           NaN  player_four  2021-01-22 13:51:12     4.8  0 days 03:45:49

```

The data types returned by the `get_df` function are basic types and no automatic type conversion is attempted.

```python

>>> df.info()

RangeIndex: 2 entries, 0 to 1

Data columns (total 5 columns):

    #   Column        Non-Null Count  Dtype  

   ---  ------        --------------  -----  

    0   bonus_points  1 non-null      float64

    1   player_id     2 non-null      object 

    2   last_play     2 non-null      object 

    3   rating        2 non-null      float64

    4   play_time     2 non-null      object 

dtypes: float64(2), object(3)

memory usage: 208.0+ bytes

```

The `dtype` parameter of the `get_df` function allows specifying the desired data types.

```python

>>> df = get_df(

...     table="players",

...     keys=keys(player_id=["player_two", "player_four"]),

...     dtype={

...         "bonus_points": "Int8",

...         "last_play": "datetime64[ns, UTC]",

...         "play_time": "timedelta64[ns]"  # See note below.

...     }

... )

>>> df.info()

RangeIndex: 2 entries, 0 to 1

Data columns (total 5 columns):

    #   Column        Non-Null Count  Dtype              

   ---  ------        --------------  -----              

    0   bonus_points  1 non-null      Int8               

    1   player_id     2 non-null      object             

    2   last_play     2 non-null      datetime64[ns, UTC]

    3   rating        2 non-null      float64            

    4   play_time     2 non-null      timedelta64[ns]    

dtypes: Int8(1), datetime64[ns, UTC](1), float64(1), object(1), timedelta64[ns](1)

memory usage: 196.0+ bytes

```

**Note**: Due to a known bug in pandas versions < 1.5, timedelta strings cannot be converted back to Timedelta type via this parameter (ref. https://github.com/pandas-dev/pandas/issues/38509). If using pandas < 1.5, use the pandas.to_timedelta function instead:

```python

>>> df.play_time = pd.to_timedelta(df.play_time)

>>> df.info()

RangeIndex: 2 entries, 0 to 1

Data columns (total 5 columns):

    #   Column        Non-Null Count  Dtype              

   ---  ------        --------------  -----              

    0   bonus_points  1 non-null      Int8               

    1   player_id     2 non-null      object             

    2   last_play     2 non-null      datetime64[ns, UTC]

    3   rating        2 non-null      float64            

    4   play_time     2 non-null      timedelta64[ns]    

dtypes: Int8(1), datetime64[ns, UTC](1), float64(1), object(1), timedelta64[ns](1)

memory usage: 196.0+ bytes

```

Omitting the `keys` parameter performs a scan of the table and returns all the items.

```python

>>> df = get_df(table="players")

>>> print(df)

       bonus_points     player_id            last_play  rating        play_time

    0           4.0  player_three  2021-01-21 10:22:43     2.5  1 days 14:01:19

    1           NaN   player_four  2021-01-22 13:51:12     4.8  0 days 03:45:49

    2           3.0    player_one  2021-01-18 22:47:23     4.3  2 days 17:41:55

    3           1.0    player_two  2021-01-19 19:07:54     3.8  0 days 22:07:34

```

## License

Released under the terms of the [MIT License](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/drgfreeman/dynamo-pandas

Awesome Lists containing this project

README