Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bufferapp/rsdf
Some functions to help Pandas DataFrames communicate with Redshift
https://github.com/bufferapp/rsdf
aws data-frame pandas redshift
Last synced: 9 days ago
JSON representation
Some functions to help Pandas DataFrames communicate with Redshift
- Host: GitHub
- URL: https://github.com/bufferapp/rsdf
- Owner: bufferapp
- License: mit
- Created: 2017-02-01T14:47:17.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2019-02-27T12:41:36.000Z (over 5 years ago)
- Last Synced: 2024-10-08T17:53:36.988Z (about 1 month ago)
- Topics: aws, data-frame, pandas, redshift
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 9
- Watchers: 10
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RSDF
[![Build Status](https://travis-ci.com/bufferapp/rsdf.svg?branch=master)](https://travis-ci.com/bufferapp/rsdf)
[![PyPI version](https://badge.fury.io/py/rsdf.svg)](https://badge.fury.io/py/rsdf)
[![License](https://img.shields.io/github/license/mashape/apistatus.svg)](LICENSE)Set of utils to connect Pandas DataFrames and Redshift. This module will add a
new function to the `DataFrame` object. Inspired by [josepablog gist](https://gist.github.com/josepablog/1ce154a45dc20348b6718804ac8ad0a5).## Installation
To install `rsdf`, simply use pip:
```bash
$ pip install rsdf
```If you were using the older version, you can also install it with `pip`:
```bash
$ pip install git+git://github.com/bufferapp/rsdf.git@d1a5feca220cef9ba7da16da57a746dfb24ee8d7
```## Usage
Once `rdsf` is imported, the `DataFrame` objects will have new functions:
```python
import pandas as pd
import rsdfengine_string = 'redshift://user:password@endpoint:port/db'
users = pd.read_sql_query('select * from users limit 10', engine_string)
users['money'] = users['money'] * 42
# Write it back to Redshift
users.to_redshift(
table_name='users',
schema='public',
engine=engine_string,
s3_bucket='users-data',
s3_key='rich_users.gzip',
if_exists='update',
primary_key='id'
)
```Alternatively, if no `engine` is provided, the `rsdf` module will try to figure out the engine string from the following environment variables:
- `REDSHIFT_USER`
- `REDSHIFT_PASSWORD`
- `REDSHIFT_ENDPOINT`
- `REDSHIFT_DB_NAME`
- `REDSHIFT_DB_PORT`Since `rsdf` uploads the files to S3 and then runs a `COPY` command to add the
data to Redshift you'll also need to provide (or have them in the environment
loaded) these two AWS variables:- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`## License
MIT © Buffer