https://github.com/machawk1/cdxjgenerator
A script to generate CDXJ TimeMaps for testing elsewhere
https://github.com/machawk1/cdxjgenerator
Last synced: about 2 months ago
JSON representation
A script to generate CDXJ TimeMaps for testing elsewhere
- Host: GitHub
- URL: https://github.com/machawk1/cdxjgenerator
- Owner: machawk1
- License: mit
- Created: 2019-02-07T14:08:58.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2021-12-15T21:08:20.000Z (over 3 years ago)
- Last Synced: 2025-04-02T03:18:21.565Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 26.4 KB
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CDXJ Generator
A Python script to generate CDXJ TimeMaps for testing elsewhere.
# Install
This tool is published to pypi. To install it:
`pip install cdxjGenerator`
To use the development version, clone this repository then `pip install .`
## Usage
These inststructions assume installation via `pip`.
To run:
cdxjGenerator [number of lines] [URI-R]
For example:cdxjGenerator 12
...will generate CDXJ output (to stdout by default) consisting of entries for 12 random URIs. Alternatively:cdxjGenerator 25000 memento.us
...will generate 25,000 entries for the URI-R `memento.us`. This output can be written to a file like:cdxjGenerator 25000 memento.us > sample.cdxj
The resulting file will likely need to be sorted before used elsewhere. Do this via:
LC_ALL=C sort sample.cdxj > sample_sorted.cdxj
This can also be performed in a single command, instead of writing to the temporary, unsorted `sample.cdxj` like:
cdxjGenerator 25000 memento.us | LC_ALL=C sort > sample_sorted.cdxj
## Background
TimeMaps are lists that enumerate URIs of resources that encapsulate prior states of the given resource. ([RFC7089 - Memento](https://tools.ietf.org/html/rfc7089)). TimeMaps are often expressed in an extension of the Web Linking ([RFC5988](https://tools.ietf.org/html/rfc5988)) format. Additional, less common formats, like JSON and CDXJ TimeMaps can also express the same information in a less rigid format. [CDXJ](https://github.com/oduwsdl/ORS/wiki/CDXJ) is the most flexible of the three and is used by [InterPlanetary Wayback (ipwb)](https://github.com/oduwsdl/ipwb), which sparked the initial need for this software existing.