Projects in Awesome Lists by internetarchive
A curated list of projects in awesome lists by internetarchive .
https://github.com/internetarchive/openlibrary
One webpage for every book ever published!
books hacktoberfest internet-archive library-catalogue open-source
Last synced: 12 May 2025
https://github.com/internetarchive/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
heritrix java warc webcrawling
Last synced: 15 May 2025
https://github.com/internetarchive/bookreader
The Internet Archive BookReader
bookreader ebooks hacktoberfest internetarchive
Last synced: 18 Dec 2025
https://github.com/internetarchive/wayback
IA's public Wayback Machine (moved from SourceForge)
Last synced: 19 Jul 2025
https://github.com/internetarchive/brozzler
brozzler - distributed browser-based web crawler
Last synced: 07 Oct 2025
https://github.com/internetarchive/wayback-machine-webextension
A web browser extension for Chrome, Firefox, Edge, and Safari 14.
Last synced: 14 May 2025
https://github.com/internetarchive/openlibrary-client
Python Client Library for the Archive.org OpenLibrary API
Last synced: 16 May 2025
https://github.com/internetarchive/dweb-mirror
Offline Internet Archive project
Last synced: 05 Apr 2025
https://github.com/internetarchive/warc
Python library for reading and writing warc files
Last synced: 05 Apr 2025
https://github.com/internetarchive/warctools
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
Last synced: 07 May 2025
https://github.com/internetarchive/archive-pdf-tools
Fast PDF generation and compression. Deals with millions of pages daily.
compression ocr pdf pdf-compression pdf-compressor pdf-generation pdf-generator pdf-to-image python
Last synced: 06 Apr 2025
https://github.com/internetarchive/fatcat
Perpetual Access To The Scholarly Record
digital-library open-access postgresql python rust scholarly-communication web-archiving
Last synced: 07 Apr 2025
https://github.com/internetarchive/fatcat-scholar
search interface for scholarly works
digital-library elasticsearch python scholarly-communication
Last synced: 07 May 2025
https://github.com/internetarchive/cdx-summary
Summarize web archive capture index (CDX) files.
archive cdx collection nodejs python report statistics summary warc web-archive webcomponents
Last synced: 07 May 2025
https://github.com/internetarchive/openlibrary-bots
A repository of cleanup bots implementing the openlibrary-client
Last synced: 05 Apr 2025
https://github.com/internetarchive/iaux
Monorepo for Archive.org UX development and prototyping.
Last synced: 06 Apr 2025
https://github.com/internetarchive/umbra
A queue-controlled browser automation tool for improving web crawl quality
Last synced: 09 Apr 2025
https://github.com/internetarchive/hind
Hashistack-IN-Docker (single container with nomad + consul + caddy)
caddy cicd consul consul-connect docker hashistack nomad
Last synced: 29 Oct 2025
https://github.com/internetarchive/wayback-machine-firefox
Reduce annoying 404 pages by automatically checking for an archived copy in the Wayback Machine. Learn more about this Test Pilot experiment at https://testpilot.firefox.com/
Last synced: 07 May 2025
https://github.com/internetarchive/internet-archive-voice-apps
Voice Apps (Actions on Google, Alexa Skill) of Internet Archive. Just say: "Ok Google, Ask Internet Archive to Play Jazz" or "Alexa, Ask Internet Internet Archive to play Instrumental Music"
actions-on-google alexa-skill dialog-flow internet-archive voice-assistant
Last synced: 09 Apr 2025
https://github.com/internetarchive/archive-hocr-tools
Efficient hOCR tooling
Last synced: 07 May 2025
https://github.com/internetarchive/liveweb
Liveweb proxy of the Wayback Machine project
Last synced: 07 May 2025
https://github.com/internetarchive/trough
Trough: Big data, small databases.
database python python3 sqlite
Last synced: 12 Jul 2025
https://github.com/internetarchive/surt
Sort-friendly URI Reordering Transform (SURT) python module
Last synced: 29 Jul 2025
https://github.com/internetarchive/epub
For code related to making ePub files
Last synced: 01 Sep 2025
https://github.com/internetarchive/dweb-transport
Internet Archive Decentralized Web Common API
Last synced: 24 Dec 2025
https://github.com/internetarchive/wayback-diff
React components to render differences between captures at the Wayback Machine
Last synced: 07 May 2025
https://github.com/internetarchive/snakebite-py3
Pure python HDFS client: python3.x version
Last synced: 16 May 2025
https://github.com/internetarchive/scrapy-warcio
Support for writing WARC files with Scrapy
python scrapy warc web-archiving
Last synced: 14 Jul 2025
https://github.com/internetarchive/newsum
Daily TV News Summary using GPT
gdelt gpt internet-archive news-summarization openapi python summarization tv tv-news
Last synced: 07 May 2025
https://github.com/internetarchive/iiif
The official Internet Archive IIIF service
Last synced: 07 May 2025
https://github.com/internetarchive/sandcrawler
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki
Last synced: 17 Oct 2025
https://github.com/internetarchive/dweb-gateway
Decentralized web Gateway for Internet Archive
Last synced: 07 May 2025
https://github.com/internetarchive/xfetch
Cache stampede test harness. Code accompanies the presentation made at RedisConf 2017, 30 May to 1 June, 2017, in San Francisco.
Last synced: 07 May 2025
https://github.com/internetarchive/openlibrary-librarians
Coordination between the OpenLibrary.org Librarian community
Last synced: 17 Jul 2025
https://github.com/internetarchive/iacopilot
Summarize and ask questions about items in the Internet Archive
cli copilot gpt iacopilot internet-archive python repl
Last synced: 07 May 2025
https://github.com/internetarchive/cicd
build & test using github registry; deploy to nomad clusters
build cicd deploy docker-images github-registry nomad test
Last synced: 08 Jul 2025
https://github.com/internetarchive/arch
Web application for distributed compute analysis of Archive-It web archive collections.
Last synced: 07 May 2025
https://github.com/internetarchive/sparkling
Internet Archive's Sparkling Data Processing Library
Last synced: 07 May 2025
https://github.com/internetarchive/iari
Import workflows for the Wikipedia Citations Database
Last synced: 07 Aug 2025
https://github.com/internetarchive/s3_loader
Watch for local files to appear and move them into S3
Last synced: 07 May 2025
https://github.com/internetarchive/wikibase-patcher
Python library for interacting with the Wikibase REST API
Last synced: 03 Aug 2025
https://github.com/internetarchive/draintasker
a tool for continuously ingesting w/arc files into the archive
Last synced: 07 May 2025
https://github.com/internetarchive/web_collection_search
An API wrapper to the Elasticsearch index of web archival collections and a web UI to explore those indexes.
Last synced: 07 May 2025
https://github.com/internetarchive/iaux-typescript-wc-template
IAUX Typescript WebComponent Template
Last synced: 07 May 2025
https://github.com/internetarchive/openlibrary-api
API documentation for https://github.com/internetarchive/openlibrary
Last synced: 15 Oct 2025
https://github.com/internetarchive/ia
A JS interface to archive.org
api download internet-archive javascript json metadata search
Last synced: 07 May 2025
https://github.com/internetarchive/ia-bin-tools
Internet Archive Command-line Utilities
Last synced: 07 May 2025
https://github.com/internetarchive/read_api_extras
Demo code for the Open Library Read API
Last synced: 03 Jul 2025
https://github.com/internetarchive/trendmachine
A mathematical model to calculate a normalized score to quantify the temporal resilience of a web page as a time-series data based on the historical observations of the page in web archives.
Last synced: 07 May 2025
https://github.com/internetarchive/chocula
journal-level metadata munging. part of fatcat project
Last synced: 07 May 2025
https://github.com/internetarchive/offlinesolr
Tool to build solr index offline
Last synced: 11 Jul 2025
https://github.com/internetarchive/wbm_ai_kg
Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Knowledge Graph project
Last synced: 28 Dec 2025
https://github.com/internetarchive/esbuild_es5
minify JS/TS files using `esbuild` and `swc` down to ES5 (uses `deno`)
Last synced: 07 May 2025
https://github.com/internetarchive/internetarchive.github.com
Internet Archive Open Source Blog
Last synced: 07 May 2025
https://github.com/internetarchive/wiki-references-db
Data models and scripts to build a database of references (broadly defined) appearing on Wikipedia and other wikis
Last synced: 07 May 2025
https://github.com/internetarchive/eventer
Eventer is a simple event dispatching library in Python
Last synced: 07 May 2025
https://github.com/internetarchive/httpd
Fast and easy-to-use web server, using the Deno native http server (hyper in rust). It serves static files & dirs, with arbitrary handling using an optional `handler` argument.
deno fileserver httpd javascript static-files webserver
Last synced: 07 May 2025
https://github.com/internetarchive/gocdx
Go package to manipulate CDX files
Last synced: 07 May 2025
https://github.com/internetarchive/isodos
Go module to interact with Internet Archive's Isodos API
Last synced: 30 Jul 2025
https://github.com/internetarchive/wbm_ai_sum
Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Archival Summary project
Last synced: 10 Apr 2025
https://github.com/internetarchive/iaux-donation-form
The Internet Archive Donation Form
Last synced: 07 May 2025
https://github.com/internetarchive/strainer
Heritrix frontier files manipulation tool.
Last synced: 25 Dec 2025
https://github.com/internetarchive/iaux-shared-resize-observer
An efficient ResizeObserver to be shared amongst many components
Last synced: 28 Dec 2025
https://github.com/internetarchive/rulesengine-client
Python client package for the playback rules engine
Last synced: 07 May 2025
https://github.com/internetarchive/iaux-item-navigator
A web component that displays item contents in-theater
Last synced: 07 May 2025
https://github.com/internetarchive/ia2fil
This dashboard shows progress of replicating Internet Archive items to Filecoin.
Last synced: 12 Jun 2025
https://github.com/internetarchive/coderunr
deploy saved changes to website unique hostnames instantly -- can skip commits, pushes & full CI/CD
cicd deployment preview-apps websites
Last synced: 18 Jun 2025
https://github.com/internetarchive/iaux-metadata-service
A service for fetching metadata about items in the Internet Archive
Last synced: 10 Apr 2025