Projects in Awesome Lists tagged with webarchives
A curated list of projects in awesome lists tagged with webarchives .
https://github.com/n0tan3rd/squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving
Last synced: 13 Sep 2025
https://github.com/N0taN3rd/Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving
Last synced: 06 Apr 2025
https://github.com/archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives
Last synced: 13 Apr 2025
https://github.com/alexwlchan/safari-webarchiver
Save web pages as Safari webarchive files from the command line
safari web-scraping webarchives webkit wkwebview
Last synced: 16 Jan 2026
https://github.com/peterk/warcworker
A dockerized, queued high fidelity web archiver based on Squidwarc
archiving high-fidelity-preservation preservation webarchives webarchiving
Last synced: 14 Mar 2026
https://github.com/iipc/robustlinks
Links on the web break all the time, robustify them!
html javascript links robust-links webarchives
Last synced: 30 Dec 2025
https://github.com/archivesunleashed/warclight
A Rails engine supporting the discovery of web archives.
blacklight discovery rails rails-engine ruby solr warc webarchive-discovery webarchives
Last synced: 27 Jul 2025
https://github.com/archivesunleashed/docker-aut
Docker image for the Archives Unleashed Toolkit
archives-unleashed aut docker docker-image spark webarchives
Last synced: 27 Apr 2025
https://github.com/oduwsdl/raintale
A Python utility for publishing a social media story built from archived web pages to multiple services.
mementos social-media storytelling surrogates web-archives webarchives
Last synced: 15 Apr 2025
https://github.com/archivesunleashed/auk
Rails application for the Archives Unleashed Cloud.
apache-spark archives-unleashed archives-unleashed-toolkit rails rails-application webarchives
Last synced: 29 Sep 2025
https://github.com/oduwsdl/aiu
A library for interacting with web archive collections at Archive-It, Trove, Pandora, and more.
archiveit metadata metadata-extraction webarchives
Last synced: 14 Dec 2025
https://github.com/oduwsdl/hypercane
A toolkit for developing algorithms that sample mementos from a web archive collection.
clustering filtering memento sampling storytelling summarization webarchives
Last synced: 15 Apr 2025