An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by internetarchive

A curated list of projects in awesome lists by internetarchive .

https://github.com/internetarchive/openlibrary

One webpage for every book ever published!

books hacktoberfest internet-archive library-catalogue open-source

Last synced: 12 May 2025

https://github.com/internetarchive/heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

heritrix java warc webcrawling

Last synced: 15 May 2025

https://github.com/internetarchive/bookreader

The Internet Archive BookReader

bookreader ebooks hacktoberfest internetarchive

Last synced: 18 Dec 2025

https://github.com/internetarchive/wayback

IA's public Wayback Machine (moved from SourceForge)

Last synced: 19 Jul 2025

https://github.com/internetarchive/brozzler

brozzler - distributed browser-based web crawler

Last synced: 07 Oct 2025

https://github.com/internetarchive/wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.

Last synced: 14 May 2025

https://github.com/internetarchive/openlibrary-client

Python Client Library for the Archive.org OpenLibrary API

Last synced: 16 May 2025

https://github.com/internetarchive/warcprox

WARC writing MITM HTTP/S proxy

Last synced: 14 Apr 2025

https://github.com/internetarchive/dweb-mirror

Offline Internet Archive project

Last synced: 05 Apr 2025

https://github.com/internetarchive/warc

Python library for reading and writing warc files

Last synced: 05 Apr 2025

https://github.com/internetarchive/warctools

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

Last synced: 07 May 2025

https://github.com/internetarchive/bookserver

Archive.org OPDS Bookserver - A standard for digital book distribution

aldiko atom-feed books opds simplye

Last synced: 07 May 2025

https://github.com/internetarchive/zeno

State-of-the-art web crawler 🔱

archiving web-crawler zeno

Last synced: 12 Mar 2025

https://github.com/internetarchive/archive-pdf-tools

Fast PDF generation and compression. Deals with millions of pages daily.

compression ocr pdf pdf-compression pdf-compressor pdf-generation pdf-generator pdf-to-image python

Last synced: 06 Apr 2025

https://github.com/internetarchive/openlibrary-bots

A repository of cleanup bots implementing the openlibrary-client

Last synced: 05 Apr 2025

https://github.com/internetarchive/iaux

Monorepo for Archive.org UX development and prototyping.

Last synced: 06 Apr 2025

https://github.com/internetarchive/umbra

A queue-controlled browser automation tool for improving web crawl quality

Last synced: 09 Apr 2025

https://github.com/internetarchive/hind

Hashistack-IN-Docker (single container with nomad + consul + caddy)

caddy cicd consul consul-connect docker hashistack nomad

Last synced: 29 Oct 2025

https://github.com/internetarchive/wayback-machine-firefox

Reduce annoying 404 pages by automatically checking for an archived copy in the Wayback Machine. Learn more about this Test Pilot experiment at https://testpilot.firefox.com/

Last synced: 07 May 2025

https://github.com/internetarchive/internet-archive-voice-apps

Voice Apps (Actions on Google, Alexa Skill) of Internet Archive. Just say: "Ok Google, Ask Internet Archive to Play Jazz" or "Alexa, Ask Internet Internet Archive to play Instrumental Music"

actions-on-google alexa-skill dialog-flow internet-archive voice-assistant

Last synced: 09 Apr 2025

https://github.com/internetarchive/archive-hocr-tools

Efficient hOCR tooling

Last synced: 07 May 2025

https://github.com/internetarchive/liveweb

Liveweb proxy of the Wayback Machine project

Last synced: 07 May 2025

https://github.com/internetarchive/trough

Trough: Big data, small databases.

database python python3 sqlite

Last synced: 12 Jul 2025

https://github.com/internetarchive/surt

Sort-friendly URI Reordering Transform (SURT) python module

Last synced: 29 Jul 2025

https://github.com/internetarchive/epub

For code related to making ePub files

Last synced: 01 Sep 2025

https://github.com/internetarchive/dweb-transport

Internet Archive Decentralized Web Common API

Last synced: 24 Dec 2025

https://github.com/internetarchive/wayback-diff

React components to render differences between captures at the Wayback Machine

Last synced: 07 May 2025

https://github.com/internetarchive/snakebite-py3

Pure python HDFS client: python3.x version

Last synced: 16 May 2025

https://github.com/internetarchive/scrapy-warcio

Support for writing WARC files with Scrapy

python scrapy warc web-archiving

Last synced: 14 Jul 2025

https://github.com/internetarchive/iiif

The official Internet Archive IIIF service

hacktoberfest

Last synced: 07 May 2025

https://github.com/internetarchive/sandcrawler

Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki

web-archiving

Last synced: 17 Oct 2025

https://github.com/internetarchive/arklet

ARK minter, binder, resolver

ark arks django postgres python

Last synced: 06 Jul 2025

https://github.com/internetarchive/dweb-gateway

Decentralized web Gateway for Internet Archive

Last synced: 07 May 2025

https://github.com/internetarchive/xfetch

Cache stampede test harness. Code accompanies the presentation made at RedisConf 2017, 30 May to 1 June, 2017, in San Francisco.

Last synced: 07 May 2025

https://github.com/internetarchive/openlibrary-librarians

Coordination between the OpenLibrary.org Librarian community

Last synced: 17 Jul 2025

https://github.com/internetarchive/iacopilot

Summarize and ask questions about items in the Internet Archive

cli copilot gpt iacopilot internet-archive python repl

Last synced: 07 May 2025

https://github.com/internetarchive/cicd

build & test using github registry; deploy to nomad clusters

build cicd deploy docker-images github-registry nomad test

Last synced: 08 Jul 2025

https://github.com/internetarchive/arch

Web application for distributed compute analysis of Archive-It web archive collections.

Last synced: 07 May 2025

https://github.com/internetarchive/sparkling

Internet Archive's Sparkling Data Processing Library

Last synced: 07 May 2025

https://github.com/internetarchive/doublethink

rethinkdb python library

Last synced: 07 May 2025

https://github.com/internetarchive/iari

Import workflows for the Wikipedia Citations Database

Last synced: 07 Aug 2025

https://github.com/internetarchive/s3_loader

Watch for local files to appear and move them into S3

Last synced: 07 May 2025

https://github.com/internetarchive/wikibase-patcher

Python library for interacting with the Wikibase REST API

Last synced: 03 Aug 2025

https://github.com/internetarchive/draintasker

a tool for continuously ingesting w/arc files into the archive

Last synced: 07 May 2025

https://github.com/internetarchive/web_collection_search

An API wrapper to the Elasticsearch index of web archival collections and a web UI to explore those indexes.

Last synced: 07 May 2025

https://github.com/internetarchive/ias3

Internet Archive S3-like connector

Last synced: 07 May 2025

https://github.com/internetarchive/iaux-typescript-wc-template

IAUX Typescript WebComponent Template

Last synced: 07 May 2025

https://github.com/internetarchive/openlibrary-api

API documentation for https://github.com/internetarchive/openlibrary

Last synced: 15 Oct 2025

https://github.com/internetarchive/ia

A JS interface to archive.org

api download internet-archive javascript json metadata search

Last synced: 07 May 2025

https://github.com/internetarchive/ia-bin-tools

Internet Archive Command-line Utilities

Last synced: 07 May 2025

https://github.com/internetarchive/read_api_extras

Demo code for the Open Library Read API

Last synced: 03 Jul 2025

https://github.com/internetarchive/trendmachine

A mathematical model to calculate a normalized score to quantify the temporal resilience of a web page as a time-series data based on the historical observations of the page in web archives.

Last synced: 07 May 2025

https://github.com/internetarchive/chocula

journal-level metadata munging. part of fatcat project

issn metadata

Last synced: 07 May 2025

https://github.com/internetarchive/offlinesolr

Tool to build solr index offline

Last synced: 11 Jul 2025

https://github.com/internetarchive/gospn

Save Page Now client in Go

Last synced: 10 Apr 2025

https://github.com/internetarchive/wbm_ai_kg

Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Knowledge Graph project

Last synced: 28 Dec 2025

https://github.com/internetarchive/epub-labs

epub-labs

Last synced: 25 Dec 2025

https://github.com/internetarchive/esbuild_es5

minify JS/TS files using `esbuild` and `swc` down to ES5 (uses `deno`)

Last synced: 07 May 2025

https://github.com/internetarchive/iare

An interactive IARI JSON viewer

Last synced: 07 May 2025

https://github.com/internetarchive/internetarchive.github.com

Internet Archive Open Source Blog

Last synced: 07 May 2025

https://github.com/internetarchive/wiki-references-db

Data models and scripts to build a database of references (broadly defined) appearing on Wikipedia and other wikis

Last synced: 07 May 2025

https://github.com/internetarchive/acs4_py

Python interface to ACS4

Last synced: 07 May 2025

https://github.com/internetarchive/eventer

Eventer is a simple event dispatching library in Python

Last synced: 07 May 2025

https://github.com/internetarchive/httpd

Fast and easy-to-use web server, using the Deno native http server (hyper in rust). It serves static files & dirs, with arbitrary handling using an optional `handler` argument.

deno fileserver httpd javascript static-files webserver

Last synced: 07 May 2025

https://github.com/internetarchive/gocdx

Go package to manipulate CDX files

Last synced: 07 May 2025

https://github.com/internetarchive/isodos

Go module to interact with Internet Archive's Isodos API

Last synced: 30 Jul 2025

https://github.com/internetarchive/wbm_ai_sum

Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Archival Summary project

Last synced: 10 Apr 2025

https://github.com/internetarchive/iaux-donation-form

The Internet Archive Donation Form

Last synced: 07 May 2025

https://github.com/internetarchive/strainer

Heritrix frontier files manipulation tool.

crawling frontier heritrix

Last synced: 25 Dec 2025

https://github.com/internetarchive/iaux-shared-resize-observer

An efficient ResizeObserver to be shared amongst many components

Last synced: 28 Dec 2025

https://github.com/internetarchive/keystone

ARCH Web Client

Last synced: 07 May 2025

https://github.com/internetarchive/rulesengine-client

Python client package for the playback rules engine

Last synced: 07 May 2025

https://github.com/internetarchive/iaux-music-player

IA music player

Last synced: 07 May 2025

https://github.com/internetarchive/iaux-item-navigator

A web component that displays item contents in-theater

Last synced: 07 May 2025

https://github.com/internetarchive/ia2fil

This dashboard shows progress of replicating Internet Archive items to Filecoin.

Last synced: 12 Jun 2025

https://github.com/internetarchive/coderunr

deploy saved changes to website unique hostnames instantly -- can skip commits, pushes & full CI/CD

cicd deployment preview-apps websites

Last synced: 18 Jun 2025

https://github.com/internetarchive/iaux-metadata-service

A service for fetching metadata about items in the Internet Archive

Last synced: 10 Apr 2025