An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with webarchives

A curated list of projects in awesome lists tagged with webarchives .

https://github.com/n0tan3rd/squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving

Last synced: 13 Sep 2025

https://github.com/N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

browser-automation chrome chrome-headless crawler crawling headless-chrome high-fidelity-preservation puppeteer webarchives webarchiving

Last synced: 06 Apr 2025

https://github.com/archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives

Last synced: 13 Apr 2025

https://github.com/alexwlchan/safari-webarchiver

Save web pages as Safari webarchive files from the command line

safari web-scraping webarchives webkit wkwebview

Last synced: 16 Jan 2026

https://github.com/peterk/warcworker

A dockerized, queued high fidelity web archiver based on Squidwarc

archiving high-fidelity-preservation preservation webarchives webarchiving

Last synced: 14 Mar 2026

https://github.com/iipc/robustlinks

Links on the web break all the time, robustify them!

html javascript links robust-links webarchives

Last synced: 30 Dec 2025

https://github.com/archivesunleashed/warclight

A Rails engine supporting the discovery of web archives.

blacklight discovery rails rails-engine ruby solr warc webarchive-discovery webarchives

Last synced: 27 Jul 2025

https://github.com/archivesunleashed/docker-aut

Docker image for the Archives Unleashed Toolkit

archives-unleashed aut docker docker-image spark webarchives

Last synced: 27 Apr 2025

https://github.com/oduwsdl/raintale

A Python utility for publishing a social media story built from archived web pages to multiple services.

mementos social-media storytelling surrogates web-archives webarchives

Last synced: 15 Apr 2025

https://github.com/oduwsdl/aiu

A library for interacting with web archive collections at Archive-It, Trove, Pandora, and more.

archiveit metadata metadata-extraction webarchives

Last synced: 14 Dec 2025

https://github.com/oduwsdl/hypercane

A toolkit for developing algorithms that sample mementos from a web archive collection.

clustering filtering memento sampling storytelling summarization webarchives

Last synced: 15 Apr 2025