{"id":15659524,"url":"https://github.com/drgfreeman/dynamo-pandas","last_synced_at":"2025-04-07T08:27:18.949Z","repository":{"id":39884697,"uuid":"345251999","full_name":"DrGFreeman/dynamo-pandas","owner":"DrGFreeman","description":"Make working with pandas data and AWS DynamoDB easy","archived":false,"fork":false,"pushed_at":"2025-01-26T20:27:10.000Z","size":172,"stargazers_count":21,"open_issues_count":4,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-31T06:09:17.750Z","etag":null,"topics":["aws","aws-dynamodb","boto3","database","dataframe","deserialization","dynamo-pandas","dynamodb","interface","pandas","serialization"],"latest_commit_sha":null,"homepage":"https://dynamo-pandas.readthedocs.io/en/stable/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DrGFreeman.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-07T03:44:03.000Z","updated_at":"2025-01-26T20:25:38.000Z","dependencies_parsed_at":"2024-04-01T03:38:14.822Z","dependency_job_id":"e9c4e899-191e-49c3-8c69-780d5235e9ef","html_url":"https://github.com/DrGFreeman/dynamo-pandas","commit_stats":{"total_commits":89,"total_committers":2,"mean_commits":44.5,"dds":"0.011235955056179803","last_synced_commit":"eed07827cb1b348352315ce2d807f8ed5368e54f"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrGFreeman%2Fdynamo-pandas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrGFreeman%2Fdynamo-pandas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrGFreeman%2Fdynamo-pandas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrGFreeman%2Fdynamo-pandas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DrGFreeman","download_url":"https://codeload.github.com/DrGFreeman/dynamo-pandas/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247617807,"owners_count":20967674,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-dynamodb","boto3","database","dataframe","deserialization","dynamo-pandas","dynamodb","interface","pandas","serialization"],"created_at":"2024-10-03T13:17:16.216Z","updated_at":"2025-04-07T08:27:18.923Z","avatar_url":"https://github.com/DrGFreeman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![unit-tests-linux](https://github.com/drgfreeman/dynamo-pandas/actions/workflows/checks.yml/badge.svg)](https://github.com/DrGFreeman/dynamo-pandas/actions/workflows/checks.yml)\n[![Documentation Status](https://readthedocs.org/projects/dynamo-pandas/badge/?version=latest)](https://dynamo-pandas.readthedocs.io/en/latest/?badge=latest)\n\n# dynamo-pandas\nMake working with pandas data and AWS DynamoDB easy.\n\n## Motivation\nThis package aims a making the transfer of data between pandas dataframes and DynamoDB as simple as possible. To meet this goal, the package offers two key features:\n1. Automatic conversion of pandas data types to DynamoDB supported data types.\n1. A simple, high level interface to *put* data from a dataframe into a DynamoDB table and *get* all or selected items from a table into a dataframe.\n\n\n## Documentation\n\nThe project's documentation is available at https://dynamo-pandas.readthedocs.io/.\n\n\n## Requirements\n* `python\u003e=3.9`\n* `pandas\u003e=1.2`\n* `boto3`\n\n## Installation\n\n```\npython -m pip install dynamo-pandas\n```\n\nThis will install the package and its dependencies except for `boto3` which is not installed by default to avoid unnecessary installation when building Lambda layers.\n\nTo include `boto3` as part of the installation, add the `boto3` \"extra\" this way:\n\n```\npython -m pip install dynamo-pandas[boto3]\n```\n\n## Example Usage\n\nConsider the pandas DataFrame below.\n\n\n```python\n\u003e\u003e\u003e print(players_df)\n\n      player_id           last_play       play_time  rating  bonus_points\n0    player_one 2021-01-18 22:47:23 2 days 17:41:55     4.3             3\n1    player_two 2021-01-19 19:07:54 0 days 22:07:34     3.8             1\n2  player_three 2021-01-21 10:22:43 1 days 14:01:19     2.5             4\n3   player_four 2021-01-22 13:51:12 0 days 03:45:49     4.8          \u003cNA\u003e\n```\n\nThe columns of the dataframe use different data types, some of which are not natively supported by DynamoDB, like numpy.datetime64, timedelta64 and pandas' nullable integers.\n\n\n```python\n\u003e\u003e\u003e players_df.info()\n\n\u003cclass 'pandas.core.frame.DataFrame'\u003e\nRangeIndex: 4 entries, 0 to 3\nData columns (total 5 columns):\n    #   Column        Non-Null Count  Dtype          \n   ---  ------        --------------  -----          \n    0   player_id     4 non-null      object         \n    1   last_play     4 non-null      datetime64[ns] \n    2   play_time     4 non-null      timedelta64[ns]\n    3   rating        4 non-null      float64        \n    4   bonus_points  3 non-null      Int8           \ndtypes: Int8(1), datetime64[ns](1), float64(1), object(1), timedelta64[ns](1)\nmemory usage: 264.0+ bytes\n```\n\nStoring the rows of this dataframe to DynamoDB requires multiple data type conversions.\n\n```python\n\u003e\u003e\u003e from dynamo_pandas import put_df, get_df, keys\n```\n\nThe `put_df` function adds or updates the rows of a dataframe into the specified table, taking care of the required type conversions (the table must be already created and the primary key column(s) be present in the dataframe).\n\n```python\n\u003e\u003e\u003e put_df(players_df, table=\"players\")\n```\n\nThe `get_df` function retrieves the items matching the speficied key(s) from the table into a dataframe.\n\n\n```python\n\u003e\u003e\u003e df = get_df(table=\"players\", keys=[{\"player_id\": \"player_three\"}, {\"player_id\": \"player_one\"}])\n\u003e\u003e\u003e print(df)\n\n   bonus_points     player_id            last_play  rating        play_time\n0             4  player_three  2021-01-21 10:22:43     2.5  1 days 14:01:19\n1             3    player_one  2021-01-18 22:47:23     4.3  2 days 17:41:55\n```\n\nIn the case where only a partition key is used, the `keys` function simplifies the generation of the keys list.\n\n\n```python\n\u003e\u003e\u003e df = get_df(table=\"players\", keys=keys(player_id=[\"player_two\", \"player_four\"]))\n\u003e\u003e\u003e print(df)\n\n   bonus_points    player_id            last_play  rating        play_time\n0           1.0   player_two  2021-01-19 19:07:54     3.8  0 days 22:07:34\n1           NaN  player_four  2021-01-22 13:51:12     4.8  0 days 03:45:49\n```\n\nThe data types returned by the `get_df` function are basic types and no automatic type conversion is attempted.\n\n\n```python\n\u003e\u003e\u003e df.info()\n\n\u003cclass 'pandas.core.frame.DataFrame'\u003e\nRangeIndex: 2 entries, 0 to 1\nData columns (total 5 columns):\n    #   Column        Non-Null Count  Dtype  \n   ---  ------        --------------  -----  \n    0   bonus_points  1 non-null      float64\n    1   player_id     2 non-null      object \n    2   last_play     2 non-null      object \n    3   rating        2 non-null      float64\n    4   play_time     2 non-null      object \ndtypes: float64(2), object(3)\nmemory usage: 208.0+ bytes\n```\n\nThe `dtype` parameter of the `get_df` function allows specifying the desired data types.\n\n```python\n\u003e\u003e\u003e df = get_df(\n...     table=\"players\",\n...     keys=keys(player_id=[\"player_two\", \"player_four\"]),\n...     dtype={\n...         \"bonus_points\": \"Int8\",\n...         \"last_play\": \"datetime64[ns, UTC]\",\n...         \"play_time\": \"timedelta64[ns]\"  # See note below.\n...     }\n... )\n\u003e\u003e\u003e df.info()\n\n\u003cclass 'pandas.core.frame.DataFrame'\u003e\nRangeIndex: 2 entries, 0 to 1\nData columns (total 5 columns):\n    #   Column        Non-Null Count  Dtype              \n   ---  ------        --------------  -----              \n    0   bonus_points  1 non-null      Int8               \n    1   player_id     2 non-null      object             \n    2   last_play     2 non-null      datetime64[ns, UTC]\n    3   rating        2 non-null      float64            \n    4   play_time     2 non-null      timedelta64[ns]    \ndtypes: Int8(1), datetime64[ns, UTC](1), float64(1), object(1), timedelta64[ns](1)\nmemory usage: 196.0+ bytes\n```\n\n**Note**: Due to a known bug in pandas versions \u003c 1.5, timedelta strings cannot be converted back to Timedelta type via this parameter (ref. https://github.com/pandas-dev/pandas/issues/38509). If using pandas \u003c 1.5, use the pandas.to_timedelta function instead:\n\n\n```python\n\u003e\u003e\u003e df.play_time = pd.to_timedelta(df.play_time)\n\u003e\u003e\u003e df.info()\n\n\u003cclass 'pandas.core.frame.DataFrame'\u003e\nRangeIndex: 2 entries, 0 to 1\nData columns (total 5 columns):\n    #   Column        Non-Null Count  Dtype              \n   ---  ------        --------------  -----              \n    0   bonus_points  1 non-null      Int8               \n    1   player_id     2 non-null      object             \n    2   last_play     2 non-null      datetime64[ns, UTC]\n    3   rating        2 non-null      float64            \n    4   play_time     2 non-null      timedelta64[ns]    \ndtypes: Int8(1), datetime64[ns, UTC](1), float64(1), object(1), timedelta64[ns](1)\nmemory usage: 196.0+ bytes\n```\n\nOmitting the `keys` parameter performs a scan of the table and returns all the items.\n\n\n```python\n\u003e\u003e\u003e df = get_df(table=\"players\")\n\u003e\u003e\u003e print(df)\n\n       bonus_points     player_id            last_play  rating        play_time\n    0           4.0  player_three  2021-01-21 10:22:43     2.5  1 days 14:01:19\n    1           NaN   player_four  2021-01-22 13:51:12     4.8  0 days 03:45:49\n    2           3.0    player_one  2021-01-18 22:47:23     4.3  2 days 17:41:55\n    3           1.0    player_two  2021-01-19 19:07:54     3.8  0 days 22:07:34\n```\n\n## License\n\nReleased under the terms of the [MIT License](LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrgfreeman%2Fdynamo-pandas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrgfreeman%2Fdynamo-pandas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrgfreeman%2Fdynamo-pandas/lists"}