Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/IpsumLorem16/S3-key-lister

List all keys in any public AWS s3 bucket, option to check if each object is public or private
https://github.com/IpsumLorem16/S3-key-lister

Last synced: 3 months ago
JSON representation

List all keys in any public AWS s3 bucket, option to check if each object is public or private

Host: GitHub
URL: https://github.com/IpsumLorem16/S3-key-lister
Owner: IpsumLorem16
Created: 2019-02-20T14:18:47.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-03-05T18:48:14.000Z (over 5 years ago)
Last Synced: 2024-05-27T12:01:48.690Z (6 months ago)
Language: Python
Size: 809 KB
Stars: 7
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # S3-key-lister

List all keys in any public Amazon s3 bucket, option to check if each object is public or private. Saves result as a .csv file



  



- Requires [Boto3](https://github.com/boto/boto3) `$ pip install boto3`

- Uses python 2.7

- No need for access keys, authentication or anything, runs 'anonymously'

- Saves key name, file size in bytes, date last modified, and 'public'|'private'|'unknown' in .csv [example](/examples/ryft.csv)

- Prints progress to command line while running, number of keys, num of public keys, and run time 

## Usage:

```

python2 s3getkeys.py -t  [--key=] [-r] [-v] [--acl] [-o=]

python2 s3getkeys.py -t  [--key] [--estimate]

python2 s3getkeys.py -t  [-h|--help]

Options:

-t, --bucket  bucket to fetch keys from  

--key            key to start from  

-r                    recursivly fetch all keys  

-v                    verbose, print keys 

--acl                 check if each key is public can take long time in large buckets  

-o, --output    name of output file, do not include .csv[default:bucket]  

--estimate            estimate how long to run [-r][--acl]  

-h,--help             show this help info  

```

### Examples:

#### Fetch all keys

Fetch *all* keys from the public bucket; https://s3.amazonaws.com/ryft-public-sample-data , check which keys are public, and save it to a file called '[ryft.csv](/examples/ryft.csv)' :

```

$ python2 s3getkeys.py -t ryft-public-sample-data --acl -r -o ryft

```

You can ctrl+c to stop the script at any time if you need to, and your .csv file should have everything saved up to that point.

#### Get estimated query time

For larger buckets, it can take a while to run, get an estimated time on how long it might take with `-r` `--acl` options enabled :

```

$ python2 s3getkeys.py -t ryft-public-sample-data --estimate

```

Prints time in seconds, as well as number of keys, and average time for 20 head_object requests (what we use to determine if a key is public or private). If you know how many objects are in a bucket you can just wait for the average time, ctrl+c to stop, and then multiply this by the number of objects for the same result. 

This is a very rough estimate, the actual the total time would depend on a lot of factors. use as a guide only.

#### Start from a particular key

You can start from any point in the bucket using the `--key` option:

```

$ python2 s3getkeys.py -t ryft-public-sample-data --key esRedditJson/esRedditJson9 --acl -r -o ryft

```

or

```

$ python2 s3getkeys.py -t ryft-public-sample-data --key esRedditJson/esRedditJson9 --estimate

```

This is especially useful if you previously used ctrl+c and want to continue from the last key found. Be careful, If you use the same filename it *will* overwrite *not* append it. 

## Notes: 

- Working, but still a work in progress. 

- Made on linux for linux, might have bugs when running on windows, and printed text in the terminal..formatting will probably be a little screwed up.  

- Large buckets with hundreds of thousands of keys can take a long time to go over with `--acl`, since it will query every individual object in the bucket to get the info. This is something that will be improved on v2. Use `--estimate` option for a rough estimate

### TODO:

- [ ] add `--prefix` and `--delimiter` options 

- [ ] add warning, `file already exists. overwrite? y/n`

- [ ] add `--continue` | `--append` option to start up where we left off

- [ ] add option to not include header in saved csv file

- [ ] add `--include` | `--exclude` option to only fetch keys that include/exclude this text

- [ ] add `--ext` | `--exclude_ext` to only include/exclude these file extensions

- [ ] integrate gcp if possible