Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oduwsdl/MementoMap
A Tool to Summarize Web Archive Holdings
https://github.com/oduwsdl/MementoMap
memento mementomap profiling python ukvs web-archive
Last synced: 3 months ago
JSON representation
A Tool to Summarize Web Archive Holdings
- Host: GitHub
- URL: https://github.com/oduwsdl/MementoMap
- Owner: oduwsdl
- License: mit
- Created: 2019-01-20T01:30:36.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2021-06-15T16:14:51.000Z (over 3 years ago)
- Last Synced: 2024-08-07T22:30:10.789Z (3 months ago)
- Topics: memento, mementomap, profiling, python, ukvs, web-archive
- Language: Python
- Homepage:
- Size: 184 KB
- Stars: 9
- Watchers: 7
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MementoMap
A framework of web archive profiling to express summary of the holdings of an archive.
```
$ pip install mementomap
``````
$ mementomap
usage: mementomap [-h] {generate,compact,lookup,batchlookup} ...positional arguments:
{generate,compact,lookup,batchlookup}
generate Generate a MementoMap from a sorted file with the
first columns as SURT (e.g., CDX/CDXJ)
compact Compact a large MementoMap file into a small one
lookup Look for a SURT into a MementoMap
batchlookup Look for a list of SURTs into a MementoMapoptional arguments:
-h, --help show this help message and exit
``````
$ mementomap generate -h
usage: mementomap generate [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
[--hdepth] [--pdepth]
infile outfilepositional arguments:
infile Input SURT/CDX/CDXJ (plain or GZip) file path or '-' for STDIN
outfile Output MementoMap file pathoptional arguments:
-h, --help show this help message and exit
--hcf Host compaction factor (deafault: Inf)
--pcf Path compaction factor (deafault: Inf)
--ha Power law alpha parameter for host (default: 16.329)
--pa Power law alpha parameter for path (default: 24.546)
--hk Power law k parameter for host (default: 0.714)
--pk Power law k parameter for path (default: 1.429)
--hdepth Max host depth (default: 8)
--pdepth Max path depth (default: 9)
``````
$ mementomap compact -h
usage: mementomap compact [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
[--hdepth] [--pdepth]
infile outfilepositional arguments:
infile Input MementoMap (plain or GZip) file path or '-' for STDIN
outfile Output MementoMap file pathoptional arguments:
-h, --help show this help message and exit
--hcf Host compaction factor (deafault: 1.0)
--pcf Path compaction factor (deafault: 1.0)
--ha Power law alpha parameter for host (default: 16.329)
--pa Power law alpha parameter for path (default: 24.546)
--hk Power law k parameter for host (default: 0.714)
--pk Power law k parameter for path (default: 1.429)
--hdepth Max host depth (default: 8)
--pdepth Max path depth (default: 9)
``````
$ mementomap lookup -h
usage: mementomap lookup [-h] mmap surtpositional arguments:
mmap MementoMap file path to look into
surt SURT to look foroptional arguments:
-h, --help show this help message and exit
``````
$ mementomap batchlookup -h
usage: mementomap batchlookup [-h] mmap infilepositional arguments:
mmap MementoMap file path to look into
infile Input SURT (plain or GZip) file path or '-' for STDINoptional arguments:
-h, --help show this help message and exit
```## Citing Project
A publication related to this project appeared in the proceedings of JCDL 2019 ([Read the PDF](https://arxiv.org/pdf/1905.12607.pdf)). Please cite it as below:
> Sawood Alam, Michele C. Weigle, Michael L. Nelson, Fernando Melo, Daniel Bicho, Daniel Gomes. __MementoMap Framework for Flexible and Adaptive Web Archive Profiling__. In _Proceedings of the 19th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2019_, pp. 172-181, Urbana-Champaign, Illinois, USA, June 2016.
```latex
@inproceedings{jcdl-2019:alam:mementomap,
author = {Sawood Alam and
Michele C. Weigle and
Michael L. Nelson and
Fernando Melo and
Daniel Bicho and
Daniel Gomes},
title = {{MementoMap} Framework for Flexible and Adaptive Web Archive Profiling},
booktitle = {Proceedings of the 19th {ACM/IEEE-CS} Joint Conference on Digital Libraries},
series = {JCDL '19},
year = {2019},
month = {jun},
location = {Urbana-Champaign, Illinois, USA},
pages = {172--181},
numpages = {10},
url = {https://doi.org/10.1109/JCDL.2019.00033},
doi = {10.1109/JCDL.2019.00033},
isbn = {978-1-7281-1547-4},
publisher = {{IEEE}}
}
```