Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/popey/pwbm

Personal WayBack Machine
https://github.com/popey/pwbm

Last synced: about 2 months ago
JSON representation

Personal WayBack Machine

Host: GitHub
URL: https://github.com/popey/pwbm
Owner: popey
License: mit
Created: 2020-01-18T13:27:49.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-01-19T10:52:24.000Z (over 4 years ago)
Last Synced: 2024-07-05T08:36:57.740Z (3 months ago)
Language: Python
Homepage: https://snapcraft.io/pwbm
Size: 34.2 KB
Stars: 119
Watchers: 5
Forks: 12
Open Issues: 5
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

[![pwbm](https://snapcraft.io//pwbm/badge.svg)](https://snapcraft.io/pwbm)

# pwbm - Personal WayBack Machine

The goal of pwbm is to make an easy to use appliance which can be fed URLs which it scrapes periodically. The content is saved in a similar manner to the popular "Wayback machine". Howver as this is a 'personal' wayback machine, you control the URLs which are scanned and when. The archive is held locally and can be easily managed.

Note: Unlike the "real" wayback machine, `pwbm` does not seek to crawl the entire web, nor does it spider entire websites. It only archives specific URLs given to it. This is by design.

## Installation

[![Get it from the Snap Store](https://snapcraft.io/static/images/badges/en/snap-store-black.svg)](https://snapcraft.io/pwbm)

`pwbm` is available as a snap in the Snap Store. The snap bundles everything needed to function, including `monolith`. Installation on Linux is as follows:

`snap install pwbm`

Note: due to the unfinished nature of `pwbm`, it's currently only available in the `edge` channel.

Alternatively just clone this repo and run the shell script. You'll also need `monolith`.

## Usage

### Adding URLs

Simply run `pwbm` with a URL you'd like it to archive. This does not currently initiate a snapshot of that page.

`pwbm https://ubuntu.com/`

### Gathering page snapshots

Run `pwbm` to start a snapshot of every page.

`pwbm`

Results are stored in `$SNAP_USER_COMMON/archive` if instaled from a snap, or `./archive` if run outside of a snap.

### How it works

It's super basic. `pwbm` just iterates through a list of URLs in a file, spawning `monolith` and saving the results in a datestamped file in a folder specific to the host and path.

```
$ tree ~/snap/pwbm/common/archive/
/home/alan/snap/pwbm/common/archive/
└── ubuntu.com
└── 2020-01-18T13:32:39+00:00-index.html

1 directory, 1 file
```

### Viewing results

Browse the files in the `archive/` folder and open them in a browse to view.

A convenience webserver has been added. It can be launched as follows, and presents the archive directory on port 8076.

`pwbm.server`

Visit `http://localhost:8076/` to view the snapshots.

## TODO

- [ ] - More error checking
- [x] - Add a webserver to make it more wayback-machine-like (and easy to use)
- [ ] - Add option for manual pruning of archives
- [ ] - Add option to remove URLs
- [ ] - Add option to report on disk usage / number of snapshots / other stats