An open API service indexing awesome lists of open source software.

https://github.com/namankumar/s3scripts

Download files from AWS S3 fast and reliably
https://github.com/namankumar/s3scripts

Last synced: about 1 year ago
JSON representation

Download files from AWS S3 fast and reliably

Awesome Lists containing this project

README

          

# Dump an S3 bucket to local drive fast and reliably

This code builds on top of the others' work that I found via random Google searches.

## Motivation
* wanted to parallelize downloading files. Currently availale utility, like s3cmd, download one file at a time
* wanted a super simple interface to download an entire bucket
* finer grain control over what / how the code actually does

## How to use this
* download.py downloads an entire s3 bucket to local drive
* inputs: bucket name, local path, number of threads
* the more the number of threads, the more simultanuous files you can download. Three is a good number of threads but you may want to experiment for your internet connection
* on error, it'll just continue to download rest of the bucket. Directories will throw an error - this is ok and is handled
* after it finishes, you'll need to account for two edge cases: files not downloaded due to brief connection loss, files partially downloaded
* first case can be handled by just running the script agin. It will update missing files without overwriting existing files
* for second case, run checkmd5.py and delete all the files that do not match. Copy paste the list into del.txt and run 'cat del.txt | xargs rm'
* run download.py again!
* your entire bucket should be mirrored on your local drive in significantly less time than pretty much any other utility, including s3cmd!