Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with internet-archiving
A curated list of projects in awesome lists tagged with internet-archiving .
https://github.com/archivebox/archivebox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss self-hosted singlefile warc wayback-machine web-archiving wget youtube-dl
Last synced: 16 Dec 2024
https://github.com/pirate/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss self-hosted singlefile warc wayback-machine web-archiving wget youtube-dl
Last synced: 30 Oct 2024
https://github.com/ArchiveBox/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss self-hosted singlefile warc wayback-machine web-archiving wget youtube-dl
Last synced: 25 Oct 2024
https://github.com/akamhy/waybackpy
Wayback Machine API interface & a command-line tool
archive-webpage archive-webpages cdx-api internet-archive internet-archiving osint savepagenow wayback-machine wayback-machine-api wayback-machine-python web-archiving webarchiving
Last synced: 21 Dec 2024
https://github.com/pirate/wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
archiving datascience docker docker-compose html internet-archiving kiwix kiwix-offline-wikipedia mediawiki mwdumper nginx openzim wiki wikipedia wikipedia-dump wikipedia-mirror xowa zim
Last synced: 17 Dec 2024
https://github.com/archivebox/good-karma-kit
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
archivebox archivewarrior boinc distributed-computing distributed-storage docker docker-compose foldingathome good-karma i2p internet-archiving ipfs kiwix pywb sia storj tor zimfarm
Last synced: 21 Dec 2024
https://github.com/ArchiveBox/good-karma-kit
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
archivebox archivewarrior boinc distributed-computing distributed-storage docker docker-compose foldingathome good-karma i2p internet-archiving ipfs kiwix pywb sia storj tor zimfarm
Last synced: 06 Nov 2024
https://github.com/ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
archivebox archiving browser-extension chrome-extension digipres digital-preservation firefox-extension internet-archiving svelte web-archiving
Last synced: 04 Nov 2024
https://github.com/ArchiveBox/electron-archivebox
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
archivebox desktop desktop-electron digipres docker electron gui internet-archiving linux macos web-archiving windows
Last synced: 25 Oct 2024
https://github.com/vegetableman/vandal
Navigator for Web Archive
chrome-extension firefox-addon internet-archiving wayback-machine webarchive
Last synced: 30 Oct 2024
https://github.com/pirate/internet-archiving-talk
🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.
archivebox censorship ethics internet-archiving slideshow talks warc web-archiving wget
Last synced: 28 Oct 2024
https://github.com/archivebox/abx-dl
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...
ai-scraping archivebox chrome cli cli-tool crawling curl downloader gallery-dl headless http-client internet-archiving playwright puppeteer scraping wget youtube-dl yt-dlp
Last synced: 10 Dec 2024
https://github.com/Own-Data-Privateer/hoardy-web
A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.
archive backups internet internet-archiving self-hosted wayback-machine web-archiving
Last synced: 23 Oct 2024
https://github.com/ArchiveBox/readability-extractor
Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.
archivebox internet-archiving node readability wrapper
Last synced: 05 Nov 2024
https://github.com/itsliamdowd/WaybackBrowserMacOS
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
application browser coding developer html internet internet-archive internet-archiving js learn macos macos-app macos-application macos-menubar macos-swift storyboard swift swiftapp wayback-archiver wayback-machine
Last synced: 25 Nov 2024