Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yinleon/s3
s3 helpers for reading files to/from pandas dataframes, moving files between buckets, and persisting scikit-learn classifiers.. all in s3.
https://github.com/yinleon/s3
pandas-dataframe s3 scikit-learn
Last synced: 25 days ago
JSON representation
s3 helpers for reading files to/from pandas dataframes, moving files between buckets, and persisting scikit-learn classifiers.. all in s3.
- Host: GitHub
- URL: https://github.com/yinleon/s3
- Owner: yinleon
- License: mit
- Created: 2017-03-03T03:53:34.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-24T04:22:14.000Z (about 6 years ago)
- Last Synced: 2024-04-26T19:20:15.685Z (6 months ago)
- Topics: pandas-dataframe, s3, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 43.9 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# S3 helper
This is a module that is helpful both in a development notebooks and deployed production pipelines that work with unstructured s3 files.The main use of this module is to programmatically, preview, process, and edit files around s3 by:
listing contents of s3 buckets using glob-like RegEx patterns.
moving or copying files between buckets (filedrop -> archives).
streaming csv and json files into Pandas dataframes on your local machine,
without manually downloading them to disk.
writing Pandas dataframes to csv and json files on s3.
loading and unloading scikit-learn models from s3.Pandas and Scikit-Learn and useful tools in the Python Data ecosystem.
Check out the tutorial and see the module in action.## Installation
Configure s3 as you would for boto3.
read here
TLDR; Environment Variables or configuring AWS CLI work best.## Usage
Install requirements
```pip install s34me```Note that this only works with Pandas 0.19.1 and below.
See: https://github.com/boto/botocore/pull/1195
See: https://github.com/pandas-dev/pandas/issues/17135When either of these are resolved, this will work with the latest distribution of Pandas.
```
import s3df = s3.read_csv('s3://bucket_name/key_name/file_name.tsv.gz',
sep='\t', compression='gzip')
```For continued use, the `$PATH` should be added to the iPython startup script
```
cd ~/.ipython/profile_default/startup
vim first.py
sys.path.append("PATH")
```## Contributing
1. Fork it!
2. Create your feature branch: `git checkout -b my-new-feature`
3. Commit your changes: `git commit -am 'Add some feature'`
4. Push to the branch: `git push origin my-new-feature`
5. Submit a pull request :D## Credits
Written by Leon Yin## License
MIT