Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with internet-archiving

A curated list of projects in awesome lists tagged with internet-archiving .

https://github.com/archivebox/archivebox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss self-hosted singlefile warc wayback-machine web-archiving wget youtube-dl

Last synced: 16 Dec 2024

https://github.com/pirate/ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss self-hosted singlefile warc wayback-machine web-archiving wget youtube-dl

Last synced: 30 Oct 2024

https://github.com/ArchiveBox/ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss self-hosted singlefile warc wayback-machine web-archiving wget youtube-dl

Last synced: 25 Oct 2024

https://github.com/pirate/wikipedia-mirror

🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

archiving datascience docker docker-compose html internet-archiving kiwix kiwix-offline-wikipedia mediawiki mwdumper nginx openzim wiki wikipedia wikipedia-dump wikipedia-mirror xowa zim

Last synced: 17 Dec 2024

https://github.com/archivebox/good-karma-kit

😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

archivebox archivewarrior boinc distributed-computing distributed-storage docker docker-compose foldingathome good-karma i2p internet-archiving ipfs kiwix pywb sia storj tor zimfarm

Last synced: 21 Dec 2024

https://github.com/ArchiveBox/good-karma-kit

😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

archivebox archivewarrior boinc distributed-computing distributed-storage docker docker-compose foldingathome good-karma i2p internet-archiving ipfs kiwix pywb sia storj tor zimfarm

Last synced: 06 Nov 2024

https://github.com/ArchiveBox/archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

archivebox archiving browser-extension chrome-extension digipres digital-preservation firefox-extension internet-archiving svelte web-archiving

Last synced: 04 Nov 2024

https://github.com/ArchiveBox/electron-archivebox

Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

archivebox desktop desktop-electron digipres docker electron gui internet-archiving linux macos web-archiving windows

Last synced: 25 Oct 2024

https://github.com/pirate/internet-archiving-talk

🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

archivebox censorship ethics internet-archiving slideshow talks warc web-archiving wget

Last synced: 28 Oct 2024

https://github.com/archivebox/abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

ai-scraping archivebox chrome cli cli-tool crawling curl downloader gallery-dl headless http-client internet-archiving playwright puppeteer scraping wget youtube-dl yt-dlp

Last synced: 10 Dec 2024

https://github.com/Own-Data-Privateer/hoardy-web

A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.

archive backups internet internet-archiving self-hosted wayback-machine web-archiving

Last synced: 23 Oct 2024

https://github.com/ArchiveBox/readability-extractor

Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.

archivebox internet-archiving node readability wrapper

Last synced: 05 Nov 2024

https://github.com/itsliamdowd/WaybackBrowserMacOS

Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻

application browser coding developer html internet internet-archive internet-archiving js learn macos macos-app macos-application macos-menubar macos-swift storyboard swift swiftapp wayback-archiver wayback-machine

Last synced: 25 Nov 2024