Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iipc/warc2html
Converts WARC files to static HTML
https://github.com/iipc/warc2html
Last synced: about 2 months ago
JSON representation
Converts WARC files to static HTML
- Host: GitHub
- URL: https://github.com/iipc/warc2html
- Owner: iipc
- License: apache-2.0
- Created: 2021-11-08T04:09:05.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-08T04:26:26.000Z (over 1 year ago)
- Last Synced: 2024-04-16T17:39:25.805Z (2 months ago)
- Language: Java
- Size: 24.4 KB
- Stars: 37
- Watchers: 10
- Forks: 3
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-web-archiving - warc2html - Converts WARC files to static HTML suitable for browsing offline or rehosting. (Tools & Software / Replay)
README
warc2html
![]()
=========Converts WARC files to static html while rewriting links to relative paths suitable for browsing offline or rehosting
on a standard web server.Limitations:
* Links in JavaScript are not rewritten
* Assumes there's only one snapshot of each URL in the input
* Does not handle resource records (yet)Usage
-----To convert a file named input.warc.gz to static HTML:
java -jar warc2html.jar -o output/ input.warc.gz
Alternatively if you'd like to convert a subset of records you can supply a list of records in CDX11 format and the
path or URL where the corresponding WARC files are stored:java -jar warc2html.jar -o output/ -b http://server/warcs/ input.cdx
Compiling
---------Install [OpenJDK 11](https://adoptium.net/) or later and [Apache Maven](https://maven.apache.org/) then compile with:
mvn package
File renaming
-------------Files are renamed to remove characters like "?" that are disallowed on some systems. File extensions are updated or added
based on the Content-Type header according to [these rules](resources/org/netpreserve/warc2html/forced.extensions).URLs ending in / will be saved as index.html. Where two WARC records would produce the same filename they are
disambiguated by adding a number like ~1, ~2, ~3 to the end of the filename.License
-------Copyright 2021 National Library of Australia \
License: [Apache 2.0](LICENSE)