Projects in Awesome Lists tagged with archiving
A curated list of projects in awesome lists tagged with archiving .
https://github.com/paperless-ngx/paperless-ngx
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
angular archiving django dms document-management document-management-system machine-learning ocr optical-character-recognition pdf
Last synced: 12 May 2025
https://github.com/xwmx/nb
CLI and local web plain text note‑taking, bookmarking, and archiving with linking, tagging, filtering, search, Git versioning & syncing, Pandoc conversion, + more, in a single portable script.
archiving bash bookmark-manager bookmarks cli command-line git knowledge-base markdown note-taking notebook notes notes-app pandoc productivity shell sync vim vscode zettelkasten
Last synced: 14 May 2025
https://github.com/jonaswinkler/paperless-ng
A supercharged version of paperless: scan, index and archive all your physical documents
angular archiving django dms document-management-system full-text-search machine-learning ocr search
Last synced: 27 Sep 2025
https://github.com/wal-e/wal-e
Continuous Archiving for Postgres
archiving azure-blob backup backups blob-store google-cloud-storage openstack-swift pitr postgres postgresql python recovery replication s3
Last synced: 13 May 2025
https://github.com/pgbackrest/pgbackrest
Reliable PostgreSQL Backup & Restore
archiving azure backup bzip2 checksum database differential gcs gzip incremental lz4 multi-process parallel pgbackrest postgres postgresql restore s3 wal zstd
Last synced: 16 Dec 2025
https://github.com/kovah/linkace
LinkAce is a self-hosted archive to collect links of your favorite websites.
archive archiving bookmark-manager bookmark-managers bookmarking bookmarks docker laravel php self-hosted selfhosted
Last synced: 13 May 2025
https://github.com/Kovah/LinkAce
LinkAce is a self-hosted archive to collect links of your favorite websites.
archive archiving bookmark-manager bookmark-managers bookmarking bookmarks docker laravel php self-hosted selfhosted
Last synced: 14 Mar 2025
https://github.com/mhx/dwarfs
A fast high compression read-only file system for Linux, Windows and macOS
archiving compression cpp deduplication dwarfs filesystem flac fuse fuse-filesystem linux lrzip lzma macfuse macos squashfs windows winfsp zpaq zstd
Last synced: 13 May 2025
https://github.com/itext/itext7
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
accessibility acroform archiving ccpa digital-signature documents encryption fips library pades pades-standard pdf pdf-generation pdfa pdfua sdk security signature-validation svg xfdf
Last synced: 23 Jun 2025
https://github.com/itext/itext-java
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
accessibility acroform archiving ccpa digital-signature documents encryption fips library pades pades-standard pdf pdf-generation pdfa pdfua sdk security signature-validation svg xfdf
Last synced: 14 May 2025
https://github.com/itext/itext-dotnet
iText for .NET is the .NET version of the iText library, formerly known as iTextSharp, which it replaces. iText represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enha
accessibility acroform archiving ccpa digital-signature documents encryption fips itextsharp library pades pdf pdf-generation pdfa pdfua sdk security signature-validation svg xfdf
Last synced: 14 May 2025
https://github.com/ransome1/sleek
todo.txt manager for Linux, Windows and MacOS, free and open-source (FOSS)
alarms archiving contexts dark-mode due-date file-watcher filters foss fulltext-search gui linux-app macos-app modern multiple-languages open-source priorities repeating-todos threshold todo-txt windows-app
Last synced: 14 May 2025
https://github.com/bareos/bareos
Bareos is a cross-network Open Source backup solution (licensed under AGPLv3) which preserves, archives, and recovers data from all major operating systems.
archiving backup backup-solution backup-utility bareos ceph compression cross-platform disaster-recovery encrypt gluster mysql postgresql python recover restore s3 security vmware
Last synced: 13 May 2025
https://github.com/webrecorder/archiveweb.page
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
archiving browser-extension chromium extension wacz warc web-archiving webrecorder
Last synced: 13 Apr 2025
https://github.com/plougher/squashfs-tools
tools to create and extract Squashfs filesystems
archiving compression filesystem linux lzo mkfs mksquashfs squashfs squashfs-image tar tarball zstd
Last synced: 14 May 2025
https://github.com/JosephLai241/URS
Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.
archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud
Last synced: 24 Mar 2025
https://github.com/gildas-lormeau/single-file-cli
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
archiving cli crawler deno dockerfile nodejs scraping-websites single-file web-archiving web-crawler web-scraper web-scraping
Last synced: 15 May 2025
https://github.com/gwern/gwern.net
Site infrastructure for gwern.net. Custom Hakyll website with unique link archiving, popup UX, transclusions/collapses, dark+reader mode, bidirectional backlinks, and typography (sidenotes, dropcaps, link icons, inflation-adjustment, subscripted-citations).
admonitions archiving disclosures gwern hakyll icons inflation pandoc reader-mode sidenotes tooltips transclusion typography wikipedia
Last synced: 16 Dec 2025
https://github.com/postgrespro/pg_probackup
Backup and recovery manager for PostgreSQL
archiving backup incremental-backups postgresql recovery restore wall
Last synced: 16 May 2025
https://github.com/pirate/wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
archiving datascience docker docker-compose html internet-archiving kiwix kiwix-offline-wikipedia mediawiki mwdumper nginx openzim wiki wikipedia wikipedia-dump wikipedia-mirror xowa zim
Last synced: 16 May 2025
https://github.com/rahiel/archiveror
Archiveror will help you preserve the webpages you love. 💾
archiving bookmark browser-extension chrome-extension firefox-extension javascript linkrot mhtml web-archiving webextension
Last synced: 07 Apr 2025
https://github.com/archiveteam/archivebot
ArchiveBot, an IRC bot for archiving websites
archiving haxe irc javascript python ruby
Last synced: 09 Apr 2025
https://github.com/ArchiveTeam/ArchiveBot
ArchiveBot, an IRC bot for archiving websites
archiving haxe irc javascript python ruby
Last synced: 17 Jul 2025
https://github.com/archivebox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
archivebox archiving browser-extension chrome-extension digipres digital-preservation firefox-extension internet-archiving svelte web-archiving
Last synced: 07 Jul 2025
https://github.com/vida-nyu/reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
archiving computational-science docker hacktoberfest linux nyu ptrace python reproducibility reproducible-research reproducible-science reprounzip reprozip science scientific-computing vagrant
Last synced: 10 Apr 2025
https://github.com/VIDA-NYU/reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
archiving computational-science docker hacktoberfest linux nyu ptrace python reproducibility reproducible-research reproducible-science reprounzip reprozip science scientific-computing vagrant
Last synced: 27 Mar 2025
https://github.com/pdf-archiver/pdf-archiver
A tool for tagging files and archiving tasks.
archive archiving archivist archivist-toolkit conventions filesystem icloud icloud-drive letters macos macos-app pdf pdf-archiver scanbot searchterm swift swift4
Last synced: 16 May 2025
https://github.com/PDF-Archiver/PDF-Archiver
A tool for tagging files and archiving tasks.
archive archiving archivist archivist-toolkit conventions filesystem icloud icloud-drive letters macos macos-app pdf pdf-archiver scanbot searchterm swift swift4
Last synced: 08 Apr 2025
https://github.com/ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
archivebox archiving browser-extension chrome-extension digipres digital-preservation firefox-extension internet-archiving svelte web-archiving
Last synced: 03 Apr 2025
https://github.com/wapmorgan/UnifiedArchive
UnifiedArchive - an archive manager with unified interface for different formats (bundled with cli utility). Supports all formats with basic operations (reading, extracting and creation) and popular formats specific features (compression level, password-protection, comment)
7zip archives archiving bz2 bzip2 cab gzip iso lzma2 manipulate-archives rar tar zip
Last synced: 15 Apr 2025
https://github.com/wapmorgan/unifiedarchive
UnifiedArchive - an archive manager with unified interface for different formats (bundled with cli utility). Supports all formats with basic operations (reading, extracting and creation) and popular formats specific features (compression level, password-protection, comment)
7zip archives archiving bz2 bzip2 cab gzip iso lzma2 manipulate-archives rar tar zip
Last synced: 13 Apr 2025
https://github.com/webrecorder/browsertrix
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
archiving cloud kubernetes wacz warc web-archive web-archiving webrecorder
Last synced: 16 May 2025
https://github.com/itext/itext-pdfhtml-java
pdfHTML is an iText add-on for Java that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, searchable and usable for indexing.
accessibility acroform archiving converter css forms html pdf pdfua svg tagged-pdf template xml
Last synced: 05 Apr 2025
https://github.com/bodgit/sevenzip
Golang library for dealing with 7-zip archives
7z 7zip archive archiving bcj bcj2 brotli compression compressor decompression decompressor deflate delta golang golang-library lz4 lzma lzma2 zstandard zstd
Last synced: 15 May 2025
https://github.com/thrau/jarchivelib
A simple archiving and compression library for Java
archiving compression extraction
Last synced: 04 Apr 2025
https://github.com/palewire/archiveis
A simple Python wrapper for the archive.is capturing service
Last synced: 08 May 2025
https://github.com/palewire/savepagenow
A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service
api archiving command-line-interface internetarchive news python
Last synced: 06 Oct 2025
https://github.com/itext/itext-pdfhtml-dotnet
pdfHTML is an iText add-on for C# (.NET) that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, searchable and usable for indexing.
accessibility acroform archiving converter css forms html itextsharp pdf pdfua svg tagged-pdf template xml
Last synced: 21 Mar 2025
https://github.com/google/fuse-archive
FUSE file system for archives and compressed files (ZIP, RAR, 7Z, ISO, TGZ, XZ...)
archiving filesystem fuse-filesystem zip
Last synced: 25 Mar 2025
https://github.com/chronicle-app/chronicle-etl
📜 A CLI toolkit for extracting and working with your digital history
archiving chronicle chronicle-etl cli csv data-liberation etl json memex personal-archive personal-data quantified-self ruby
Last synced: 05 Apr 2025
https://github.com/hedgedoc/cli
A tiny CLI for HedgeDoc
archiving backup-utility bash cli cli-app codimd codimd-cli hedgedoc markdown notes
Last synced: 17 Jul 2025
https://github.com/Arcadia-Solutions/arcadia
Content-agnostic torrent site & tracker framework
archiving bittorrent bittorrent-tracker decentralized p2p peer-to-peer torrent tracker
Last synced: 15 Jul 2025
https://github.com/janw/podcast-archiver
Archive all your favorite podcasts
archiving feedparser podcast podcasts python rss
Last synced: 27 Mar 2025
https://github.com/jiisanda/docflow
DocFlow is a powerful Document Management API designed to streamline document handling, including seamless uploading, downloading, organization, versioning, sharing, and more.
access-control-list api archiving docker docker-compose document-management document-management-system document-sharing ec2 fastapi jwt-authentication nginx postgresql pydantic rest s3 versioning
Last synced: 05 Apr 2025
https://github.com/palewire/news-homepages
An open-source archive that gathers, saves, shares and analyzes news homepages
a11y a11y-testing actions archiving bot git-scraper journalism lighthouse lighthouse-audits news playwright playwright-python python screenshots sphinx static-site telegram-bot twitter twitter-bot
Last synced: 16 May 2025
https://github.com/archiveteam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd
Last synced: 04 Apr 2025
https://github.com/TheChymera/mkstage4
Bash Utility for Creating Stage 4 Tarballs
archiving backup gentoo linux stage4 system-management
Last synced: 20 Jul 2025
https://github.com/thechymera/mkstage4
Bash Utility for Creating Stage 4 Tarballs
archiving backup gentoo linux stage4 system-management
Last synced: 19 Apr 2025
https://github.com/bottomless-archive-project/library-of-alexandria
Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.
archiving library-of-alexandria
Last synced: 15 Apr 2025
https://github.com/mhucka/devonthink-hacks
Scripts and other things for working with DEVONthink, a personal information management system.
applescript archiving automation devonthink information-gathering information-management pdf-generation
Last synced: 13 Sep 2025
https://github.com/ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd
Last synced: 18 Jul 2025
https://github.com/gdamdam/iagitup
A command line tool to archive a git repository from GitHub to the Internet Archive.
archive archiving cli git github internet-archive internetarchive
Last synced: 22 Aug 2025
https://github.com/jupyterlab-contrib/jupyter-archive
A Jupyter/Jupyterlab extension to make, download and extract archive files.
archiving jupyterlab jupyterlab-extension jupyterlab-extensions
Last synced: 26 Jul 2025
https://github.com/ropensci/arkdb
Archive and unarchive databases as flat text files
archiving database dbi peer-reviewed r r-package rstats
Last synced: 09 Apr 2025
https://github.com/peterk/warcworker
A dockerized, queued high fidelity web archiver based on Squidwarc
archiving high-fidelity-preservation preservation webarchives webarchiving
Last synced: 12 Apr 2025
https://github.com/palewire/storytracker
Tools for tracking stories on news homepages
Last synced: 27 Jul 2025
https://github.com/gildas-lormeau/singlefile-safari-extension
Source code of SingleFile for Safari
archiving ios macos screenshot web-extension webpage
Last synced: 15 Apr 2025
https://github.com/gildas-lormeau/SingleFile-Safari-Extension
Source code of SingleFile for Safari
archiving ios macos screenshot web-extension webpage
Last synced: 27 Mar 2025
https://github.com/Own-Data-Privateer/hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
archive archiver archiving auto-save backups browser-extension cli internet internet-archiving offline-reading self-hosted snapshot wayback-machine web-archive web-archiving web-browsing website-archive
Last synced: 11 Mar 2025
https://github.com/killercup/static-filez
Build compressed archives for static files and serve them over HTTP
archiving http rust static-server
Last synced: 29 Jul 2025
https://github.com/alex313031/windows-7-stuffz
Files for Windows 7 that are hard to find because Microsoft took them down.
archiving drivers kb nt-kernel windows-7 windows7 windows7-8 windows7-simulator windows7-windows11
Last synced: 20 Jun 2025
https://github.com/evolsoft/phartools
A powerful PHP-CLI tool to manage phar (PHP-Archive) files
archiving packaging phar-files php php-cli
Last synced: 09 Jul 2025
https://github.com/palewire/savemy.news
Save My News: A personal, permanent clipping service
archiving django journalism news python
Last synced: 19 Apr 2025
https://github.com/caltechlibrary/waystation
Automatically archive your repository's GitHub Pages in the Wayback Machine.
archiving automation documentation github-action github-actions github-automation github-pages internet-archive preservation wayback-machine
Last synced: 12 Jun 2025
https://github.com/peterk/munin-indexer
A social media open post web archiving tool
archiving high-fidelity-preservation preservation webarchiving
Last synced: 12 Apr 2025
https://github.com/ArchiveBox/pocket-exporter
A service to help export your pocket bookmarks, tags, saved article text, and more...
archivebox archiving bookmarks getpocket html internet-archiving pocket urls web-archiving
Last synced: 19 Aug 2025
https://github.com/goldbattle/twitch_vod_creator
Download twitch vods, clips, and render videos with chat.
archiving ffmpeg twitch twitch-chat twitch-vods vod
Last synced: 13 Jun 2025
https://github.com/alopezrivera/anchorage
Save your bookmark collection in the Internet Archive, or locally.
archiving internet-archive permanence web
Last synced: 29 Apr 2025
https://github.com/mhucka/devilfish
A utility for simultaneously creating full-page PDF snapshots and web archives of web pages in DEVONthink Pro.
archiving devonthink pdf web webarchive
Last synced: 24 Feb 2025
https://github.com/ArchiveTeam/WebArchiver
Decentralized web archiving
archiver archiving crawler decentralized python warc web webarchiving
Last synced: 07 Apr 2025
https://github.com/archiveteam/webarchiver
Decentralized web archiving
archiver archiving crawler decentralized python warc web webarchiving
Last synced: 15 May 2025
https://github.com/ouranosinc/miranda
A modern Python utility library for climate data collection and management
archiving climate collection management netcdf
Last synced: 31 Aug 2025
https://github.com/Ouranosinc/miranda
A modern Python utility library for climate data collection and management
archiving climate collection management netcdf
Last synced: 20 Jul 2025
https://github.com/troglobit/zoo
public domain zoo archive tool
archiver archiving backup compression zoo
Last synced: 18 Mar 2025
https://github.com/dbeley/reddit_export_userdata
Export userdata from your reddit accounts. Submissions, comments, saved, upvoted contents are supported.
archivebox archiving reddit reddit-scraper
Last synced: 29 Apr 2025
https://github.com/shukriadams/browsemonkey
Takes snapshots of file systems for offline browsing and searching.
archiving datahoarder takes-snapshots util
Last synced: 10 May 2025
https://github.com/dirkpetersen/froster
Froster is a user-friendly archiving tool for teams that move data between Posix file systems and S3 like object storage systems such as AWS Glacier
archiving aws boto3 cli duckdb glacier hpc metadata petabyte pwalk python rclone s3 slurm storage tui
Last synced: 21 Sep 2025
https://github.com/funkyfuture/compose-dump
Dump and restore Docker Compose-projects
archiving backup docker-compose
Last synced: 11 Apr 2025
https://github.com/dbeley/archiveboxmatic
ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.
archivebox archiving web-archiving
Last synced: 29 Apr 2025
https://github.com/emijrp/internet-archive
Scripts for Internet Archive
archive archiving crawler digital-preservation internet-archive webpage website
Last synced: 21 Jun 2025
https://github.com/thomasleplus/gdrive-archive
Archiving tool for Google Drive
archive archiver archiving backup google google-drive google-drive-api google-drive-cli google-drive-sdk googledrive
Last synced: 14 Apr 2025
https://github.com/sclevine/xsum
Checksums with Merkle trees and concurrency
archiving asn1 asn1-der audio checksum checksums data-archiving go golang hash hashing md5 merkle merkle-tree merkletree pcm sha256
Last synced: 03 Sep 2025
https://github.com/sivasamyk/graylog-plugin-output-webhdfs
WebHDFS Output plugin for Graylog
archiving graylog graylog-plugin hadoop webhdfs
Last synced: 14 Apr 2025
https://github.com/sagebind/respk
Manage resource files using a fast, custom open format designed especially for use in games.
archiving game-development resource-loader
Last synced: 14 Apr 2025
https://github.com/wabarc/rivet
A toolkit makes it easier to archive webpages to IPFS
Last synced: 16 May 2025
https://github.com/caltechlibrary/iga
IGA is the InvenioRDM GitHub Archiver, a standalone program as well as a GitHub Action that lets you automatically archive GitHub software releases in an InvenioRDM repository.
archives archiving automation code-preservation github-action github-actions github-automation invenio invenio-rdm preservation reproducibility reproducible-research research-data-management research-software software-archiving software-preservation source-code-archiving
Last synced: 24 Jun 2025
https://github.com/indiscipline/rearchiver
Prepare your Reaper project for archiving, converting WAV to FLAC and WAVPACK and changing the RPP file accordingly
archiving compression daw reaper
Last synced: 06 Mar 2025
https://github.com/interkosmos/fortran-zlib
Fortran 2018 interface bindings to zlib
archiving compression fortran fortran-2018 fortran-package-manager zlib
Last synced: 20 Feb 2025