Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dhamaniasad/WARCTools
A list of tools related to W(eb)ARC(hive)
https://github.com/dhamaniasad/WARCTools
Last synced: 5 days ago
JSON representation
A list of tools related to W(eb)ARC(hive)
- Host: GitHub
- URL: https://github.com/dhamaniasad/WARCTools
- Owner: dhamaniasad
- Created: 2014-09-20T10:39:43.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-11-01T10:36:18.000Z (about 10 years ago)
- Last Synced: 2024-10-12T09:16:03.958Z (28 days ago)
- Homepage: dhamaniasad.github.io/WARCTools/
- Size: 277 KB
- Stars: 54
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-starred - dhamaniasad/WARCTools - A list of tools related to W(eb)ARC(hive) (others)
README
WARCTools
=========A list of tools related to W(eb)ARC(hives)
* [heritrix](https://github.com/internetarchive/heritrix3) - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
* [umbra](https://github.com/internetarchive/umbra) - A queue-controlled browser automation tool for improving web crawl quality
* [wayback](https://github.com/internetarchive/wayback) - Wayback Machine. Used for playing back saved WARC files.
* [CDX-Writer](https://github.com/internetarchive/CDX-Writer) - Python script to create CDX index files of WARC data
* [warcprox](https://github.com/internetarchive/warcprox) - WARC writing MITM HTTP/S proxy
* [warctools](https://github.com/internetarchive/warctools) - warctools
* [warc_creator](https://github.com/jcushman/warc_creator) - WSGI server to generate WARC files
* [pywb-webrecorder](https://github.com/ikreymer/pywb-webrecorder) - pywb + warcprox: Wayback Web Replay + Archiving via recording proxy
* [wget](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output) - The development version of Wget can write its results to a WARC file
* [warc](https://github.com/internetarchive/warc) - Python library for reading and writing warc files
* [WarcMiddleware](https://github.com/odie5533/WarcMiddleware) - WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy
* [WarcProxy](https://github.com/odie5533/WarcProxy) - Saves proxied HTTP traffic to a WARC file
* [WarcMITMProxy](https://github.com/odie5533/WarcMITMProxy) - HTTP(S) proxy that saves traffic to a WARC file, using libmitmproxy.
* [warcreate](https://github.com/machawk1/warcreate) - Chrome extension to "Create WARC files from any webpage"
* [node-warc-proxy](https://github.com/ualbertalib/node-warc-proxy) - Simple node.js server to allow navigation of the contents of a WARC file
* [WarcReplay](https://github.com/odie5533/WarcReplay) - Creates a proxy that lets you view the contents of a Warc file as though you were browsing the live web
* [WarcTwistedMITMProxy](https://github.com/odie5533/WarcTwistedMITMProxy) - Web proxy supporting MITM SSL and saving traffic to a Warc file, using the Twisted networking library
* [vcproxy](https://github.com/kngenie/vcproxy) - a tiny HTTP proxy that archives traffic in WARCs
* [pywb](https://github.com/ikreymer/pywb) - Python WayBack for web archive replay
* [warc-proxy](https://github.com/alard/warc-proxy) - Serving content from a WARC
* [megawarc](https://github.com/alard/megawarc) - Nondestructive warc-in-tar to warc conversion
* [warctozip-service](https://github.com/alard/warctozip-service) - An HTTP-based warc-to-zip converter
* [warcat](https://github.com/chfoo/warcat) - Tool and library for handling Web ARChive (WARC) files
* [pylibwarc](https://github.com/odie5533/pylibwarc/) - A Python library for dealing with Web ARChive (WARC) files
* [wpull](https://github.com/chfoo/wpull) - Wget-compatible web downloader and crawler
* [warctozip](https://github.com/alard/warctozip) - Convert a warc to a zip with Hanzo warc-tools and warctozip.py
* [pymiproxy](https://github.com/allfro/pymiproxy) - A small and sweet man-in-the-middle proxy capable of doing HTTP and HTTP over SSL
* [liveweb](https://github.com/internetarchive/liveweb) - Liveweb proxy of the Wayback Machine project
* [PhantomWARC](https://github.com/dhamaniasad/PhantomWARC) - Generate WARC files from dynamic webpages
* [WarcQtViewer](https://github.com/odie5533/WarcQtViewer) - GUI to view and manage .warc and .warc.gz files.