Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sangupta/wayback-machine-download
A CLI to work with Archive.org Wayback Machine including downloading an entire snapshot
https://github.com/sangupta/wayback-machine-download
Last synced: 4 days ago
JSON representation
A CLI to work with Archive.org Wayback Machine including downloading an entire snapshot
- Host: GitHub
- URL: https://github.com/sangupta/wayback-machine-download
- Owner: sangupta
- License: other
- Created: 2015-08-05T15:05:14.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2022-09-01T22:45:36.000Z (about 2 years ago)
- Last Synced: 2024-04-16T11:17:29.678Z (7 months ago)
- Language: Java
- Size: 24.4 KB
- Stars: 3
- Watchers: 5
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Wayback Machine Download
========================A simple command-line interface to interact with archive.org
**WayBack Machine** including tool to download a snapshot of an entire site. This is particularly
useful for making up for lost backups and disaster recovery.Dependencies
------------The tool needs to the following to run:
* Oracle JDK 7 (should work with OpenJDK, but haven't tested)
* Internet ConnectionUsage
-----```
usage: wmd []The most commonly used wmd commands are:
configure Generate a configuration for the site
download Download the entire site
help Display help informationSee 'wmd help ' for more information on a specific command.
```To begin the download process, first generate a configuration answering the command prompts:
```
$ java -jar wayback-machine-download.jar configureConfiguration filename: somesite
Wayback URL to start with: http://web.archive.org/web/20140929053608/http://www.somesite.com/
Folder path where to dump the site to: ~/wayback
Max crawling depth: 5
```This will generate a configuration file called `somesite.wmd` in the folder to be used.
To start the download process, execute:
```
$ java -jar wayback-machine-download.jar download somesite.wmd
```This will crawl and dump the entire site in `~/wayback` folder.
Why this tool?
--------------I lost code to a static website that I maintained for a friend of mine, and the server it was hosted
on crashed beyond recovery. I was left with two options:* either to recreate the site manually, which would have been months if not days
* or, to try and download everything from archive.orgAs I found there were two tools that could help me retrieve dump from wayback machine:
* warrick - https://code.google.com/p/warrick/
* http://waybackdownloader.com/I tried using `warrick` but for some reason it did not work. My bad - am sure, I would not have
configured it properly. Again am not a `Perl` guy so as to debug it. The second option requested
me money, and thus was ruled out.This led to the birth of this tool: to help me recover the site from `wayback machine`.
Am sure someone would need it too, one day!
License
-------
```
wmdownload - Download an entire website using Wayback Machine
Copyright (c) 2015-2016, Sandeep Guptahttp://sangupta.com/projects/wayback-machine-download
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```