Projects in Awesome Lists tagged with webarchive
A curated list of projects in awesome lists tagged with webarchive .
https://github.com/karust/gogetcrawl
Extract web archive data using Wayback Machine and Common Crawl
commoncrawl concurrency crawler golang wayback-machine webarchive
Last synced: 15 Jan 2026
https://github.com/vegetableman/vandal
Navigator for Web Archive
chrome-extension firefox-addon internet-archiving wayback-machine webarchive
Last synced: 21 Feb 2026
https://github.com/helgeho/archivespark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
archivespark internet-archive spark spark-framework warc web-archiving webarchive
Last synced: 05 Apr 2025
https://github.com/helgeho/ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
archivespark internet-archive spark spark-framework warc web-archiving webarchive
Last synced: 08 Apr 2025
https://github.com/chatnoir-eu/chatnoir-resiliparse
A robust web archive analytics toolkit
bigdata cpp cython extraction htmlparser python warc web webarchive
Last synced: 04 Apr 2026
https://github.com/n0tan3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
chrome-remote-interface pupeteer warc warc-files web-archives web-archiving webarchive webarchiving
Last synced: 07 May 2025
https://github.com/N0taN3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
chrome-remote-interface pupeteer warc warc-files web-archives web-archiving webarchive webarchiving
Last synced: 06 Aug 2025
https://github.com/rcarmo/python-webarchive
Create WebKit/Safari .webarchive files on any platform
Last synced: 21 Jul 2025
https://github.com/mathis2001/webhackurls
Simple python OSINT tool for urls recon thanks to the waybackmachine.
bugbounty osint pentesting recon wayback-machine webarchive
Last synced: 27 Apr 2025
https://github.com/cipher387/quickcacheandarchivesearch
Quick Cache and Archive search buttons
baidu-cache google-cache webarchive webarchiving yandex-cache
Last synced: 15 Oct 2025
https://github.com/mhucka/devilfish
A utility for simultaneously creating full-page PDF snapshots and web archives of web pages in DEVONthink Pro.
archiving devonthink pdf web webarchive
Last synced: 24 Feb 2025
https://github.com/ticky/webarchive
📑 Rust utilities for working with Apple's Web Archive file format
rust-crate rust-lang safari webarchive
Last synced: 17 Feb 2026
https://github.com/helgeho/hadoopconcatgz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
hadoop spark warc web-archiving webarchive
Last synced: 14 Apr 2025
https://github.com/gonejack/webarchive-to-singlefile
This command line converts .webarchive file to resources embed .html file
Last synced: 29 Jan 2026
https://github.com/sicos1977/webarchiveextractor
A .NET Standard 2.0 library to extract a Safari web archive to a folder
Last synced: 23 Aug 2025
https://github.com/q-m/scrapy-webarchive
A plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.
scrapy wacz warc webarchive webarchive-data-scraping
Last synced: 24 Apr 2025
https://github.com/mccallofthewild/alexandrias-revenge
🔥The bold new archive that can’t be burned, bulldozed or battering-rammed #PoweredByArweave
archive article-extractor arweave blockchain webarchive
Last synced: 21 Apr 2025
https://github.com/ganapativs/puppeteer-warc
Create WARC (Web ARChive) of a web page
Last synced: 18 May 2026
https://github.com/ibnesayeed/archival-tests
A set of web archival replay test cases
archival-replay memento replay-tests testing webarchive webarchiving
Last synced: 12 Jan 2026
https://github.com/helgeho/warcpartitioner
Partition (W)ARC Files by MIME Type and Year
hadoop warc web-archiving webarchive
Last synced: 14 Apr 2025
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 29 Mar 2025
https://github.com/airborne-commando/link-extractor-and-archive
A link extractor and archive tool, uses archive.ph as an archiving service; useful for sites that are barebones and aren't advanced.
archive cli gui-python python terminal webarchive webarchiving
Last synced: 29 Apr 2026
https://github.com/maxmmueller/404-to-archive-redirector
Greasemonkey script that redirects from a 404 page to the Wayback Machine.
404-redirect greasemonkey javascript tampermonkey webarchive
Last synced: 17 Feb 2026
https://github.com/n0tan3rd/node-cdxj
Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js
cdxj web-archives webarchive webarchiving
Last synced: 17 Aug 2025
https://github.com/piecelet/neodb-trending-history
Trending History of Books, Movies, TVs, Music, Games, Podcasts, and Collections for NeoDB, an open sourced fediverse community that can discover, track, share and discuss your books, movies, tv, music, games, podcasts, and shows. See https://github.com/neodb-social/neodb for NeoDB.
archive bluesky book books douban fediverse game games goodreads historical-data history imdb letterboxd mastodon movies music neodb podcast tv webarchive
Last synced: 07 May 2026
https://github.com/commoncrawl/arc2warc-conversion
Experiences converting Common Crawl's ARC files from the crawls 2008 - 2012 to the WARC format
arc arc-files warc warc-files warc-format webarchive webarchiving
Last synced: 16 Feb 2026
https://github.com/pereslavtsev/memento-client
Time Travel APIs NodeJS library with full support of the Memento protocol.
memento timetravel wayback webarchive
Last synced: 29 Jun 2025
https://github.com/vishwas-r/internet-archive-assistant
Firefox Addon & Chrome Extension for effortlessly saving web pages to the Internet Archive or viewing their latest archived versions. Perfect for preserving content and retrieving snapshots.
chrome-extension firefox-addon internetarchive webarchive
Last synced: 31 Mar 2025
https://github.com/gonejack/html-to-webarchive
This command line converts .html file to Safari's .webarchive file.
Last synced: 14 Jan 2026