https://github.com/relrelb/wayback-downloader
A simple downloader client for the Wayback Machine
https://github.com/relrelb/wayback-downloader
downloader python wayback-machine
Last synced: 4 months ago
JSON representation
A simple downloader client for the Wayback Machine
- Host: GitHub
- URL: https://github.com/relrelb/wayback-downloader
- Owner: relrelb
- Created: 2018-02-25T18:53:45.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-02-26T10:56:50.000Z (over 7 years ago)
- Last Synced: 2023-03-05T07:09:58.436Z (over 2 years ago)
- Topics: downloader, python, wayback-machine
- Language: Python
- Size: 6.84 KB
- Stars: 16
- Watchers: 3
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# wayback-downloader
A simple downloader client for the Wayback Machine written in Python.
```
Usage:
python {--help|-h}
python [--threads ] [--matchType {exact|prefix|host|domain}] [--from ] [--to ] [--limit ] [--dry]
Options:
--help, -h Display this help message and exit
--threads, -T Number of downloading threads (default: 10)
--matchType, -m What results will be downloaded based on
exact Download results matching exactly
prefix Download results under the path
host Download results from host of
domain Download results from host of and all subhosts of
--from, -f Download results that were captured after this timestamp
--to, -t Download results that were captured before this timestamp
Both and must be a prefix of "yyyyMMddhhmmss"
--limit, -l Download at most snapshots
--dry, -d List items to be downloaded without downloading them
Example:
Use the following command:
python --matchType prefix --from 2010 --to 201606 --limit 1000 example.org
To download at most 1000 abarity pages under example.org between the year of 2010 and the month of June 2016 (inclusive).
For more information, see: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md
```