https://github.com/namankumar/s3scripts
Download files from AWS S3 fast and reliably
https://github.com/namankumar/s3scripts
Last synced: about 1 year ago
JSON representation
Download files from AWS S3 fast and reliably
- Host: GitHub
- URL: https://github.com/namankumar/s3scripts
- Owner: namankumar
- Created: 2019-06-30T02:04:58.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-07-02T06:44:08.000Z (almost 7 years ago)
- Last Synced: 2025-04-13T16:16:16.074Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dump an S3 bucket to local drive fast and reliably
This code builds on top of the others' work that I found via random Google searches.
## Motivation
* wanted to parallelize downloading files. Currently availale utility, like s3cmd, download one file at a time
* wanted a super simple interface to download an entire bucket
* finer grain control over what / how the code actually does
## How to use this
* download.py downloads an entire s3 bucket to local drive
* inputs: bucket name, local path, number of threads
* the more the number of threads, the more simultanuous files you can download. Three is a good number of threads but you may want to experiment for your internet connection
* on error, it'll just continue to download rest of the bucket. Directories will throw an error - this is ok and is handled
* after it finishes, you'll need to account for two edge cases: files not downloaded due to brief connection loss, files partially downloaded
* first case can be handled by just running the script agin. It will update missing files without overwriting existing files
* for second case, run checkmd5.py and delete all the files that do not match. Copy paste the list into del.txt and run 'cat del.txt | xargs rm'
* run download.py again!
* your entire bucket should be mirrored on your local drive in significantly less time than pretty much any other utility, including s3cmd!