Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oduwsdl/memgator
A Memento Aggregator CLI and Server in Go
https://github.com/oduwsdl/memgator
memento memento-rfc timemap web-archiving
Last synced: about 2 months ago
JSON representation
A Memento Aggregator CLI and Server in Go
- Host: GitHub
- URL: https://github.com/oduwsdl/memgator
- Owner: oduwsdl
- License: mit
- Created: 2015-09-08T01:43:25.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2024-05-21T15:07:30.000Z (8 months ago)
- Last Synced: 2024-10-24T18:36:00.269Z (2 months ago)
- Topics: memento, memento-rfc, timemap, web-archiving
- Language: Go
- Homepage: https://memgator.cs.odu.edu/api.html
- Size: 15 MB
- Stars: 57
- Watchers: 14
- Forks: 11
- Open Issues: 44
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MemGator
A Memento Aggregator CLI and Server in [Go](https://golang.org/).
## Features
* The binary (available for various platforms) can be used as the CLI or run as a Web Service
* Results available in three formats - Link/JSON/CDXJ
* TimeMap, TimeGate, and Memento (redirect or description) endpoints
* Optional streaming of benchmarks over [Server-Sent Events](http://www.html5rocks.com/en/tutorials/eventsource/basics/) (SSE) for realtime visualization and monitoring
* Good API parity with the [main Memento Aggregator service](http://timetravel.mementoweb.org/guide/api/)
* Concurrent - Splits every session in subtasks for parallel execution
* Parallel - Utilizes all the available CPUs
* Custom archive list (a local JSON file or a remote URL) - A sample JSON is included in the repository
* Probability based archive prioritization and limit
* Configurable automated temporary exclusion of malfunctioning upstream archives
* Three levels of customizable timeouts for greater control over remote requests
* Customizable logging and profiling in CDXJ format
* Customizable endpoint URLs - Helpful in load-balancing
* Customizable User-Agent to be sent to each archive and User-Agent spoofing
* Configurable archive failure detection and automatic hibernation
* [CORS](http://www.w3.org/TR/cors/) support to make it easy to use it from JavaScript clients
* Memento count exposed in the header that can be retrieved via `HEAD` request
* [Docker](https://www.docker.com/) friendly - An image available as [oduwsdl/memgator](https://hub.docker.com/r/oduwsdl/memgator)
* Sensible defaults - Batteries included, but replaceable## Usage
### CLI
Command line interface of MemGator allows retrieval of the TimeMap and the description of the closest Memento (equivalent to the TimeGate) over `STDOUT` in all supported formats. Logs and benchmarks (in verbose mode) and Error output are available on `STDERR` unless appropriate files are configured. For further details, see the full usage.
```
$ memgator [options] {URI-R} # TimeMap from CLI
$ memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # Description of the closest Memento from CLI
```### Server
When run as a Web Service, MemGator exposes following customizable endpoints:
```
$ memgator [options] server
TimeMap: http://localhost:1208/timemap/{FORMAT}/{URI-R}
TimeGate: http://localhost:1208/timegate/{URI-R} [Accept-Datetime]
Memento: http://localhost:1208/memento[/{FORMAT}|proxy]/{DATETIME}/{URI-R}
About: http://localhost:1208/about
Monitor: http://localhost:1208/monitor - (Over SSE, if enabled){FORMAT} => link|json|cdxj
{DATETIME} => YYYY[MM[DD[hh[mm[ss]]]]]
[Accept-Datetime] => Header in RFC1123 format
```* `TimeMap` endpoint serves an aggregated TimeMap for a given URI-R in accordance with the [Memento RFC](http://tools.ietf.org/html/rfc7089). Additionally, it makes sure that the Mementos are chronologically ordered. It also provides the TimeMap data serialized in additional experimental formats.
* `TimeGate` endpoint allows datetime negotiation via the `Accept-Datetime` header in accordance with the [Memento RFC](http://tools.ietf.org/html/rfc7089). A successful response redirects to the closes Memento (to the given datetime) using the `Location` header. The default datetime is the current time. A successful response also includes a `Link` header which provides links to the first, last, next, and previous Mementos.
* `Memento` endpoint allows datetime negotiation in the request URL itself for clients that cannot easily send custom request headers (as opposed to the `TimeGate` which requires the `Accept-Datetime` header). This endpoint behaves differently based on whether the `format` was specified in the request. It essentially splits the functionality of the `TimeGate` endpoint as follows:
* If a format is specified, it returns the description of the closest Memento (to the given datetime) in the specified format. It is essentially the same data that is available in the `Link` header of the `TimeGate` response, but as the payload in the format requested by the client.
* If a format is not specified, it redirects to the closest Memento (to the given datetime) using the `Location` header.
* If the term `proxy` is used instead of a format then it acts like a proxy for the closest original unmodified Memento with added CORS headers.
* `About` endpoint reports the list of upstream archives, their status, and values of various configurations of the server.
* `Monitor` is an optional endpoint that can be enabled by the `--monitor` flag when the server is started. If enabled, it provides a stream of the benchmark log over [SSE](http://www.html5rocks.com/en/tutorials/eventsource/basics/) for realtime visualization and monitoring.**NOTE:** A fallback endpoint `/api` is added for compatibility with [Time Travel APIs](http://timetravel.mementoweb.org/guide/api/#memento-json) to allow drop-in replacement in existing tools. This endpoint is an alias to the `/memento` endpoint that returns the description of a Memento.
## Download and Install
Depending on the machine and operating system download appropriate binary from the [releases page](https://github.com/oduwsdl/MemGator/releases). Change the mode of the file to executable `chmod +x MemGator-BINARY`. Run from the current location of the downloaded binary or rename it to `memgator` and move it into a directory that is in the `PATH` (such as `/usr/local/bin/`) to make it available as a command.
## Running as a Docker Container
Build a Docker image locally from the source.
```
$ git clone https://github.com/oduwsdl/MemGator.git
$ cd MemGator
$ docker image build -t oduwsdl/memgator .
```Alternatively, pull a published image from one of the two Docker image registries below:
```
$ docker image pull docker.pkg.github.com/oduwsdl/memgator/memgator
$ docker image pull oduwsdl/memgator
```Run MemGator with various options inside a Docker container.
```
$ docker container run -it --rm oduwsdl/memgator -h
$ docker container run -it --rm oduwsdl/memgator [options] {URI-R}
$ docker container run -it --rm oduwsdl/memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]}
$ docker container run -d --name=memgator-server -p 1208:1208 oduwsdl/memgator [options] server
$ curl -i http://localhost:1208/about
$ docker container rm -f memgator-server
```## Full Usage
```
_____ _______ __
/ \ _____ _____ / _____/______/ |___________
/ Y Y \/ __ \/ \/ \ ___\__ \ _/ _ \_ _ \
/ | | \ ___/ Y Y \ \_\ \/ __ | | |_| | | \/
\__/___\__/\____\__|_|__/\_______/_____|__|\___/|__|# MemGator ({Version})
A Memento Aggregator CLI and Server in Go
Usage:
memgator [options] {URI-R} # TimeMap from CLI
memgator [options] {URI-R} {YYYY[MM[DD[hh[mm[ss]]]]]} # Description of the closest Memento from CLI
memgator [options] server # Run as a Web ServiceOptions:
-A, --agent=MemGator/{Version} <{CONTACT}> User-agent string sent to archives
-a, --arcs=https://git.io/archives Local/remote JSON file path/URL for list of archives
-b, --benchmark= Benchmark file location - defaults to Logfile
-c, --contact=https://git.io/MemGator Comment/Email/URL/Handle - used in the user-agent
-D, --static= Directory path to serve static assets from
-d, --dormant=15m0s Dormant period after consecutive failures
-F, --tolerance=-1 Failure tolerance limit for each archive
-f, --format=Link Output format - Link/JSON/CDXJ
-H, --host=localhost Host name - only used in web service mode
-k, --topk=-1 Aggregate only top k archives based on probability
-l, --log= Log file location - defaults to STDERR
-m, --monitor=false Benchmark monitoring via SSE
-P, --proxy=http://{HOST}[:{PORT}]{ROOT} Proxy URL - defaults to host, port, and root
-p, --port=1208 Port number - only used in web service mode
-R, --root=/ Service root path prefix
-r, --restimeout=1m0s Response timeout for each archive
-S, --spoof=false Spoof each request with a random user-agent
-T, --hdrtimeout=30s Header timeout for each archive
-t, --contimeout=5s Connection timeout for each archive
-V, --verbose=false Show Info and Profiling messages on STDERR
-v, --version=false Show name and version
```## Build
Assuming that Git and Go (version >= 1.14) are installed. Cloning, running, building, and installing the code can be done using following commands:
```
$ git clone https://github.com/oduwsdl/MemGator.git
$ cd MemGator
$ go run main.go
$ go build
$ go install
$ memgator --help
$ memgator http://example.com/
```To compile cross-platform binaries run the `crossbuild.sh` script:
```
$ ./crossbuild.sh
```This will generate binaries for various OSes and Architectures in `/tmp/mgbins` directory.
## Citing Project
A publication related to this project appeared in the proceedings of JCDL 2016 ([Read the PDF](https://www.cs.odu.edu/~mln/pubs/jcdl-2016/jcdl-2016-alam-memgator.pdf)). Please cite it as below:
> Sawood Alam and Michael L. Nelson. __MemGator - A Portable Concurrent Memento Aggregator: Cross-Platform CLI and Server Binaries in Go__. In _Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016_, pp. 243-244, Newark, New Jersey, USA, June 2016.
```bib
@inproceedings{jcdl-2016:alam:memgator,
author = {Sawood Alam and
Michael L. Nelson},
title = {{MemGator - A Portable Concurrent Memento Aggregator}},
booktitle = {Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries},
series = {JCDL '16},
year = {2016},
month = {jun},
location = {Newark, New Jersey, USA},
pages = {243--244},
numpages = {2},
url = {http://dx.doi.org/10.1145/2910896.2925452},
doi = {10.1145/2910896.2925452},
isbn = {978-1-4503-4229-2},
publisher = {ACM},
address = {New York, NY, USA}
}
```