Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by internetarchive

A curated list of projects in awesome lists by internetarchive .

https://github.com/internetarchive/heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

heritrix java warc webcrawling

Last synced: 31 Jul 2024

https://github.com/internetarchive/bookreader

The Internet Archive BookReader

bookreader ebooks hacktoberfest internetarchive

Last synced: 01 Aug 2024

https://github.com/internetarchive/wayback

IA's public Wayback Machine (moved from SourceForge)

Last synced: 07 Aug 2024

https://github.com/internetarchive/brozzler

brozzler - distributed browser-based web crawler

Last synced: 01 Aug 2024

https://github.com/internetarchive/wayback-machine-webextension

A web browser extension for Chrome, Firefox, Edge, and Safari 14.

Last synced: 31 Jul 2024

https://github.com/internetarchive/warcprox

WARC writing MITM HTTP/S proxy

Last synced: 01 Aug 2024

https://github.com/internetarchive/dweb-mirror

Offline Internet Archive project

Last synced: 07 Aug 2024

https://github.com/internetarchive/warc

Python library for reading and writing warc files

Last synced: 09 Aug 2024

https://github.com/internetarchive/umbra

A queue-controlled browser automation tool for improving web crawl quality

Last synced: 07 Aug 2024

https://github.com/internetarchive/liveweb

Liveweb proxy of the Wayback Machine project

Last synced: 01 Aug 2024

https://github.com/internetarchive/archive-hocr-tools

Efficient hOCR tooling

Last synced: 01 Aug 2024

https://github.com/internetarchive/sandcrawler

Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki

web-archiving

Last synced: 07 Aug 2024

https://github.com/internetarchive/iacopilot

Summarize and ask questions about items in the Internet Archive

cli copilot gpt iacopilot internet-archive python repl

Last synced: 31 Jul 2024