Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/oduwsdl/memgator

A Memento Aggregator CLI and Server in Go
https://github.com/oduwsdl/memgator

memento memento-rfc timemap web-archiving

Last synced: about 2 months ago
JSON representation

A Memento Aggregator CLI and Server in Go

Awesome Lists containing this project

README

        

# MemGator

A Memento Aggregator CLI and Server in [Go](https://golang.org/).

## Features

* The binary (available for various platforms) can be used as the CLI or run as a Web Service
* Results available in three formats - Link/JSON/CDXJ
* TimeMap, TimeGate, and Memento (redirect or description) endpoints
* Optional streaming of benchmarks over [Server-Sent Events](http://www.html5rocks.com/en/tutorials/eventsource/basics/) (SSE) for realtime visualization and monitoring
* Good API parity with the [main Memento Aggregator service](http://timetravel.mementoweb.org/guide/api/)
* Concurrent - Splits every session in subtasks for parallel execution
* Parallel - Utilizes all the available CPUs
* Custom archive list (a local JSON file or a remote URL) - A sample JSON is included in the repository
* Probability based archive prioritization and limit
* Configurable automated temporary exclusion of malfunctioning upstream archives
* Three levels of customizable timeouts for greater control over remote requests
* Customizable logging and profiling in CDXJ format
* Customizable endpoint URLs - Helpful in load-balancing
* Customizable User-Agent to be sent to each archive and User-Agent spoofing
* Configurable archive failure detection and automatic hibernation
* [CORS](http://www.w3.org/TR/cors/) support to make it easy to use it from JavaScript clients
* Memento count exposed in the header that can be retrieved via `HEAD` request
* [Docker](https://www.docker.com/) friendly - An image available as [oduwsdl/memgator](https://hub.docker.com/r/oduwsdl/memgator)
* Sensible defaults - Batteries included, but replaceable

## Usage

### CLI

Command line interface of MemGator allows retrieval of the TimeMap and the description of the closest Memento (equivalent to the TimeGate) over `STDOUT` in all supported formats. Logs and benchmarks (in verbose mode) and Error output are available on `STDERR` unless appropriate files are configured. For further details, see the full usage.

```
$ memgator [options] {URI-R} # TimeMap from CLI
$ memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # Description of the closest Memento from CLI
```

### Server

When run as a Web Service, MemGator exposes following customizable endpoints:

```
$ memgator [options] server
TimeMap: http://localhost:1208/timemap/{FORMAT}/{URI-R}
TimeGate: http://localhost:1208/timegate/{URI-R} [Accept-Datetime]
Memento: http://localhost:1208/memento[/{FORMAT}|proxy]/{DATETIME}/{URI-R}
About: http://localhost:1208/about
Monitor: http://localhost:1208/monitor - (Over SSE, if enabled)

{FORMAT} => link|json|cdxj
{DATETIME} => YYYY[MM[DD[hh[mm[ss]]]]]
[Accept-Datetime] => Header in RFC1123 format
```

* `TimeMap` endpoint serves an aggregated TimeMap for a given URI-R in accordance with the [Memento RFC](http://tools.ietf.org/html/rfc7089). Additionally, it makes sure that the Mementos are chronologically ordered. It also provides the TimeMap data serialized in additional experimental formats.
* `TimeGate` endpoint allows datetime negotiation via the `Accept-Datetime` header in accordance with the [Memento RFC](http://tools.ietf.org/html/rfc7089). A successful response redirects to the closes Memento (to the given datetime) using the `Location` header. The default datetime is the current time. A successful response also includes a `Link` header which provides links to the first, last, next, and previous Mementos.
* `Memento` endpoint allows datetime negotiation in the request URL itself for clients that cannot easily send custom request headers (as opposed to the `TimeGate` which requires the `Accept-Datetime` header). This endpoint behaves differently based on whether the `format` was specified in the request. It essentially splits the functionality of the `TimeGate` endpoint as follows:
* If a format is specified, it returns the description of the closest Memento (to the given datetime) in the specified format. It is essentially the same data that is available in the `Link` header of the `TimeGate` response, but as the payload in the format requested by the client.
* If a format is not specified, it redirects to the closest Memento (to the given datetime) using the `Location` header.
* If the term `proxy` is used instead of a format then it acts like a proxy for the closest original unmodified Memento with added CORS headers.
* `About` endpoint reports the list of upstream archives, their status, and values of various configurations of the server.
* `Monitor` is an optional endpoint that can be enabled by the `--monitor` flag when the server is started. If enabled, it provides a stream of the benchmark log over [SSE](http://www.html5rocks.com/en/tutorials/eventsource/basics/) for realtime visualization and monitoring.

**NOTE:** A fallback endpoint `/api` is added for compatibility with [Time Travel APIs](http://timetravel.mementoweb.org/guide/api/#memento-json) to allow drop-in replacement in existing tools. This endpoint is an alias to the `/memento` endpoint that returns the description of a Memento.

## Download and Install

Depending on the machine and operating system download appropriate binary from the [releases page](https://github.com/oduwsdl/MemGator/releases). Change the mode of the file to executable `chmod +x MemGator-BINARY`. Run from the current location of the downloaded binary or rename it to `memgator` and move it into a directory that is in the `PATH` (such as `/usr/local/bin/`) to make it available as a command.

## Running as a Docker Container

Build a Docker image locally from the source.

```
$ git clone https://github.com/oduwsdl/MemGator.git
$ cd MemGator
$ docker image build -t oduwsdl/memgator .
```

Alternatively, pull a published image from one of the two Docker image registries below:

```
$ docker image pull docker.pkg.github.com/oduwsdl/memgator/memgator
$ docker image pull oduwsdl/memgator
```

Run MemGator with various options inside a Docker container.

```
$ docker container run -it --rm oduwsdl/memgator -h
$ docker container run -it --rm oduwsdl/memgator [options] {URI-R}
$ docker container run -it --rm oduwsdl/memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]}
$ docker container run -d --name=memgator-server -p 1208:1208 oduwsdl/memgator [options] server
$ curl -i http://localhost:1208/about
$ docker container rm -f memgator-server
```

## Full Usage

```
_____ _______ __
/ \ _____ _____ / _____/______/ |___________
/ Y Y \/ __ \/ \/ \ ___\__ \ _/ _ \_ _ \
/ | | \ ___/ Y Y \ \_\ \/ __ | | |_| | | \/
\__/___\__/\____\__|_|__/\_______/_____|__|\___/|__|

# MemGator ({Version})

A Memento Aggregator CLI and Server in Go

Usage:
memgator [options] {URI-R} # TimeMap from CLI
memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # Description of the closest Memento from CLI
memgator [options] server # Run as a Web Service

Options:
-A, --agent=MemGator/{Version} <{CONTACT}> User-agent string sent to archives
-a, --arcs=https://git.io/archives Local/remote JSON file path/URL for list of archives
-b, --benchmark= Benchmark file location - defaults to Logfile
-c, --contact=https://git.io/MemGator Comment/Email/URL/Handle - used in the user-agent
-D, --static= Directory path to serve static assets from
-d, --dormant=15m0s Dormant period after consecutive failures
-F, --tolerance=-1 Failure tolerance limit for each archive
-f, --format=Link Output format - Link/JSON/CDXJ
-H, --host=localhost Host name - only used in web service mode
-k, --topk=-1 Aggregate only top k archives based on probability
-l, --log= Log file location - defaults to STDERR
-m, --monitor=false Benchmark monitoring via SSE
-P, --proxy=http://{HOST}[:{PORT}]{ROOT} Proxy URL - defaults to host, port, and root
-p, --port=1208 Port number - only used in web service mode
-R, --root=/ Service root path prefix
-r, --restimeout=1m0s Response timeout for each archive
-S, --spoof=false Spoof each request with a random user-agent
-T, --hdrtimeout=30s Header timeout for each archive
-t, --contimeout=5s Connection timeout for each archive
-V, --verbose=false Show Info and Profiling messages on STDERR
-v, --version=false Show name and version
```

## Build

Assuming that Git and Go (version >= 1.14) are installed. Cloning, running, building, and installing the code can be done using following commands:

```
$ git clone https://github.com/oduwsdl/MemGator.git
$ cd MemGator
$ go run main.go
$ go build
$ go install
$ memgator --help
$ memgator http://example.com/
```

To compile cross-platform binaries run the `crossbuild.sh` script:

```
$ ./crossbuild.sh
```

This will generate binaries for various OSes and Architectures in `/tmp/mgbins` directory.

## Citing Project

A publication related to this project appeared in the proceedings of JCDL 2016 ([Read the PDF](https://www.cs.odu.edu/~mln/pubs/jcdl-2016/jcdl-2016-alam-memgator.pdf)). Please cite it as below:

> Sawood Alam and Michael L. Nelson. __MemGator - A Portable Concurrent Memento Aggregator: Cross-Platform CLI and Server Binaries in Go__. In _Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016_, pp. 243-244, Newark, New Jersey, USA, June 2016.

```bib
@inproceedings{jcdl-2016:alam:memgator,
author = {Sawood Alam and
Michael L. Nelson},
title = {{MemGator - A Portable Concurrent Memento Aggregator}},
booktitle = {Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries},
series = {JCDL '16},
year = {2016},
month = {jun},
location = {Newark, New Jersey, USA},
pages = {243--244},
numpages = {2},
url = {http://dx.doi.org/10.1145/2910896.2925452},
doi = {10.1145/2910896.2925452},
isbn = {978-1-4503-4229-2},
publisher = {ACM},
address = {New York, NY, USA}
}
```