https://github.com/steffenfritz/html2warc
simple script to convert web resources to a single warc file
https://github.com/steffenfritz/html2warc
Last synced: 10 months ago
JSON representation
simple script to convert web resources to a single warc file
- Host: GitHub
- URL: https://github.com/steffenfritz/html2warc
- Owner: steffenfritz
- License: mit
- Fork: true (ampoffcom/html2warc)
- Created: 2015-12-30T14:29:32.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2023-05-11T18:17:59.000Z (almost 3 years ago)
- Last Synced: 2024-11-16T21:33:05.752Z (over 1 year ago)
- Language: Python
- Size: 10.7 KB
- Stars: 18
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
- webarchiving-awesome-graph - html2warc - A simple script to convert offline data into a single WARC file. 💽 ⭐ 22 👀 3 (Tools & Software / Acquisition)
- awesome-datahoarding - html2warc
- awesome-web-archiving - html2warc - A simple script to convert offline data into a single WARC file. *(Stable)* (Tools & Software / Acquisition)
- awesome-datahoarder - html2warc
README
# html2warc
A simple script to convert offline data into a warc file
# Usage
python html2warc.py $TARGET_URI $SOURCE_DIR $TARGET_WARC