{"id":16557363,"url":"https://github.com/yinleon/s3","last_synced_at":"2025-10-28T20:31:50.563Z","repository":{"id":57463393,"uuid":"83755027","full_name":"yinleon/s3","owner":"yinleon","description":"s3 helpers for reading files to/from pandas dataframes, moving files between buckets, and persisting scikit-learn classifiers.. all in s3.","archived":false,"fork":false,"pushed_at":"2018-10-24T04:22:14.000Z","size":45,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-04-26T19:20:15.685Z","etag":null,"topics":["pandas-dataframe","s3","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yinleon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-03T03:53:34.000Z","updated_at":"2022-11-22T15:02:20.000Z","dependencies_parsed_at":"2022-09-13T11:01:05.591Z","dependency_job_id":null,"html_url":"https://github.com/yinleon/s3","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yinleon%2Fs3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yinleon%2Fs3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yinleon%2Fs3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yinleon%2Fs3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yinleon","download_url":"https://codeload.github.com/yinleon/s3/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238719766,"owners_count":19519268,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas-dataframe","s3","scikit-learn"],"created_at":"2024-10-11T20:07:12.915Z","updated_at":"2025-10-28T20:31:50.187Z","avatar_url":"https://github.com/yinleon.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# S3 helper\nThis is a module that is helpful both in a development notebooks and deployed production pipelines that work with unstructured s3 files.\n\nThe main use of this module is to programmatically, preview, process, and edit files around s3 by:\n\nlisting contents of s3 buckets using glob-like RegEx patterns.\u003cbr\u003e\nmoving or copying files between buckets (filedrop -\u003e archives).\u003cbr\u003e\nstreaming csv and json files into Pandas dataframes on your local machine, \nwithout manually downloading them to disk.\u003cbr\u003e\nwriting Pandas dataframes to csv and json files on s3.\u003cbr\u003e\nloading and unloading scikit-learn models from s3.\n\nPandas and Scikit-Learn and useful tools in the Python Data ecosystem.\u003cbr\u003e\nCheck out the \u003ca href='http://nbviewer.jupyter.org/github/yinleon/s3/blob/master/tutorial.ipynb'\u003etutorial\u003c/a\u003e and see the module in action.\n\n\n## Installation\nConfigure s3 as you would for boto3.\n\u003ca href=\"http://boto3.readthedocs.io/en/latest/guide/configuration.html\"\u003eread here\u003c/a\u003e\u003cbr\u003e\nTLDR; Environment Variables or configuring AWS CLI work best.\n\n## Usage\nInstall requirements\n```pip install s34me```\n\nNote that this only works with Pandas 0.19.1 and below.\u003cbr\u003e\nSee: https://github.com/boto/botocore/pull/1195\u003cbr\u003e\nSee: https://github.com/pandas-dev/pandas/issues/17135\u003cbr\u003e\n\nWhen either of these are resolved, this will work with the latest distribution of Pandas.\n```\nimport s3\n\ndf = s3.read_csv('s3://bucket_name/key_name/file_name.tsv.gz', \n                 sep='\\t', compression='gzip')\n```\n\nFor continued use, the `$PATH` should be added to the iPython startup script\n\n```\ncd ~/.ipython/profile_default/startup\nvim first.py\nsys.path.append(\"PATH\")\n```\n\n\n## Contributing\n1. Fork it!\n2. Create your feature branch: `git checkout -b my-new-feature`\n3. Commit your changes: `git commit -am 'Add some feature'`\n4. Push to the branch: `git push origin my-new-feature`\n5. Submit a pull request :D\n\n## Credits\nWritten by Leon Yin\n\n## License\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyinleon%2Fs3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyinleon%2Fs3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyinleon%2Fs3/lists"}