An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with archiving

A curated list of projects in awesome lists tagged with archiving .

https://github.com/paperless-ngx/paperless-ngx

A community-supported supercharged version of paperless: scan, index and archive all your physical documents

angular archiving django dms document-management document-management-system machine-learning ocr optical-character-recognition pdf

Last synced: 12 May 2025

https://github.com/the-paperless-project/paperless

Scan, index, and archive all of your paper documents

archiving documents ocr paper search

Last synced: 13 Mar 2025

https://github.com/xwmx/nb

CLI and local web plain text note‑taking, bookmarking, and archiving with linking, tagging, filtering, search, Git versioning & syncing, Pandoc conversion, + more, in a single portable script.

archiving bash bookmark-manager bookmarks cli command-line git knowledge-base markdown note-taking notebook notes notes-app pandoc productivity shell sync vim vscode zettelkasten

Last synced: 14 May 2025

https://github.com/jonaswinkler/paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents

angular archiving django dms document-management-system full-text-search machine-learning ocr search

Last synced: 27 Sep 2025

https://github.com/libarchive/libarchive

Multi-format archive and compression library

7zip archiving backup bz2 cab cpio gz iso9660 libarchive lzma posix rar tar xar xz zip zstd

Last synced: 12 May 2025

https://github.com/kovah/linkace

LinkAce is a self-hosted archive to collect links of your favorite websites.

archive archiving bookmark-manager bookmark-managers bookmarking bookmarks docker laravel php self-hosted selfhosted

Last synced: 13 May 2025

https://github.com/Kovah/LinkAce

LinkAce is a self-hosted archive to collect links of your favorite websites.

archive archiving bookmark-manager bookmark-managers bookmarking bookmarks docker laravel php self-hosted selfhosted

Last synced: 14 Mar 2025

https://github.com/mhx/dwarfs

A fast high compression read-only file system for Linux, Windows and macOS

archiving compression cpp deduplication dwarfs filesystem flac fuse fuse-filesystem linux lrzip lzma macfuse macos squashfs windows winfsp zpaq zstd

Last synced: 13 May 2025

https://github.com/itext/itext7

iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.

accessibility acroform archiving ccpa digital-signature documents encryption fips library pades pades-standard pdf pdf-generation pdfa pdfua sdk security signature-validation svg xfdf

Last synced: 23 Jun 2025

https://github.com/itext/itext-java

iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.

accessibility acroform archiving ccpa digital-signature documents encryption fips library pades pades-standard pdf pdf-generation pdfa pdfua sdk security signature-validation svg xfdf

Last synced: 14 May 2025

https://github.com/itext/itext-dotnet

iText for .NET is the .NET version of the iText library, formerly known as iTextSharp, which it replaces. iText represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enha

accessibility acroform archiving ccpa digital-signature documents encryption fips itextsharp library pades pdf pdf-generation pdfa pdfua sdk security signature-validation svg xfdf

Last synced: 14 May 2025

https://github.com/archiveteam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

archiving crawl crawler spider warc

Last synced: 15 May 2025

https://github.com/ArchiveTeam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

archiving crawl crawler spider warc

Last synced: 07 Apr 2025

https://github.com/bareos/bareos

Bareos is a cross-network Open Source backup solution (licensed under AGPLv3) which preserves, archives, and recovers data from all major operating systems.

archiving backup backup-solution backup-utility bareos ceph compression cross-platform disaster-recovery encrypt gluster mysql postgresql python recover restore s3 security vmware

Last synced: 13 May 2025

https://github.com/webrecorder/archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

archiving browser-extension chromium extension wacz warc web-archiving webrecorder

Last synced: 13 Apr 2025

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 24 Mar 2025

https://github.com/gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

archiving cli crawler deno dockerfile nodejs scraping-websites single-file web-archiving web-crawler web-scraper web-scraping

Last synced: 15 May 2025

https://github.com/gwern/gwern.net

Site infrastructure for gwern.net. Custom Hakyll website with unique link archiving, popup UX, transclusions/collapses, dark+reader mode, bidirectional backlinks, and typography (sidenotes, dropcaps, link icons, inflation-adjustment, subscripted-citations).

admonitions archiving disclosures gwern hakyll icons inflation pandoc reader-mode sidenotes tooltips transclusion typography wikipedia

Last synced: 16 Dec 2025

https://github.com/postgrespro/pg_probackup

Backup and recovery manager for PostgreSQL

archiving backup incremental-backups postgresql recovery restore wall

Last synced: 16 May 2025

https://github.com/pirate/wikipedia-mirror

🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

archiving datascience docker docker-compose html internet-archiving kiwix kiwix-offline-wikipedia mediawiki mwdumper nginx openzim wiki wikipedia wikipedia-dump wikipedia-mirror xowa zim

Last synced: 16 May 2025

https://github.com/archiveteam/archivebot

ArchiveBot, an IRC bot for archiving websites

archiving haxe irc javascript python ruby

Last synced: 09 Apr 2025

https://github.com/ArchiveTeam/ArchiveBot

ArchiveBot, an IRC bot for archiving websites

archiving haxe irc javascript python ruby

Last synced: 17 Jul 2025

https://github.com/archivebox/archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

archivebox archiving browser-extension chrome-extension digipres digital-preservation firefox-extension internet-archiving svelte web-archiving

Last synced: 07 Jul 2025

https://github.com/vida-nyu/reprozip

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

archiving computational-science docker hacktoberfest linux nyu ptrace python reproducibility reproducible-research reproducible-science reprounzip reprozip science scientific-computing vagrant

Last synced: 10 Apr 2025

https://github.com/VIDA-NYU/reprozip

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

archiving computational-science docker hacktoberfest linux nyu ptrace python reproducibility reproducible-research reproducible-science reprounzip reprozip science scientific-computing vagrant

Last synced: 27 Mar 2025

https://github.com/ArchiveBox/archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

archivebox archiving browser-extension chrome-extension digipres digital-preservation firefox-extension internet-archiving svelte web-archiving

Last synced: 03 Apr 2025

https://github.com/wapmorgan/UnifiedArchive

UnifiedArchive - an archive manager with unified interface for different formats (bundled with cli utility). Supports all formats with basic operations (reading, extracting and creation) and popular formats specific features (compression level, password-protection, comment)

7zip archives archiving bz2 bzip2 cab gzip iso lzma2 manipulate-archives rar tar zip

Last synced: 15 Apr 2025

https://github.com/wapmorgan/unifiedarchive

UnifiedArchive - an archive manager with unified interface for different formats (bundled with cli utility). Supports all formats with basic operations (reading, extracting and creation) and popular formats specific features (compression level, password-protection, comment)

7zip archives archiving bz2 bzip2 cab gzip iso lzma2 manipulate-archives rar tar zip

Last synced: 13 Apr 2025

https://github.com/webrecorder/browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

archiving cloud kubernetes wacz warc web-archive web-archiving webrecorder

Last synced: 16 May 2025

https://github.com/itext/itext-pdfhtml-java

pdfHTML is an iText add-on for Java that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, searchable and usable for indexing.

accessibility acroform archiving converter css forms html pdf pdfua svg tagged-pdf template xml

Last synced: 05 Apr 2025

https://github.com/thrau/jarchivelib

A simple archiving and compression library for Java

archiving compression extraction

Last synced: 04 Apr 2025

https://github.com/palewire/archiveis

A simple Python wrapper for the archive.is capturing service

api archiving news python

Last synced: 08 May 2025

https://github.com/bandundu/email-archiver

Email archiving tool for IMAP/POP3 accounts (early development)

archiving email flask imap pop react

Last synced: 16 May 2025

https://github.com/palewire/savepagenow

A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service

api archiving command-line-interface internetarchive news python

Last synced: 06 Oct 2025

https://github.com/itext/itext-pdfhtml-dotnet

pdfHTML is an iText add-on for C# (.NET) that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, searchable and usable for indexing.

accessibility acroform archiving converter css forms html itextsharp pdf pdfua svg tagged-pdf template xml

Last synced: 21 Mar 2025

https://github.com/google/fuse-archive

FUSE file system for archives and compressed files (ZIP, RAR, 7Z, ISO, TGZ, XZ...)

archiving filesystem fuse-filesystem zip

Last synced: 25 Mar 2025

https://github.com/chronicle-app/chronicle-etl

📜 A CLI toolkit for extracting and working with your digital history

archiving chronicle chronicle-etl cli csv data-liberation etl json memex personal-archive personal-data quantified-self ruby

Last synced: 05 Apr 2025

https://github.com/Arcadia-Solutions/arcadia

Content-agnostic torrent site & tracker framework

archiving bittorrent bittorrent-tracker decentralized p2p peer-to-peer torrent tracker

Last synced: 15 Jul 2025

https://github.com/janw/podcast-archiver

Archive all your favorite podcasts

archiving feedparser podcast podcasts python rss

Last synced: 27 Mar 2025

https://github.com/jiisanda/docflow

DocFlow is a powerful Document Management API designed to streamline document handling, including seamless uploading, downloading, organization, versioning, sharing, and more.

access-control-list api archiving docker docker-compose document-management document-management-system document-sharing ec2 fastapi jwt-authentication nginx postgresql pydantic rest s3 versioning

Last synced: 05 Apr 2025

https://github.com/internetarchive/zeno

State-of-the-art web crawler 🔱

archiving web-crawler zeno

Last synced: 12 Mar 2025

https://github.com/archiveteam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd

Last synced: 04 Apr 2025

https://github.com/TheChymera/mkstage4

Bash Utility for Creating Stage 4 Tarballs

archiving backup gentoo linux stage4 system-management

Last synced: 20 Jul 2025

https://github.com/thechymera/mkstage4

Bash Utility for Creating Stage 4 Tarballs

archiving backup gentoo linux stage4 system-management

Last synced: 19 Apr 2025

https://github.com/bottomless-archive-project/library-of-alexandria

Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.

archiving library-of-alexandria

Last synced: 15 Apr 2025

https://github.com/mhucka/devonthink-hacks

Scripts and other things for working with DEVONthink, a personal information management system.

applescript archiving automation devonthink information-gathering information-management pdf-generation

Last synced: 13 Sep 2025

https://github.com/ArchiveTeam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd

Last synced: 18 Jul 2025

https://github.com/gdamdam/iagitup

A command line tool to archive a git repository from GitHub to the Internet Archive.

archive archiving cli git github internet-archive internetarchive

Last synced: 22 Aug 2025

https://github.com/jupyterlab-contrib/jupyter-archive

A Jupyter/Jupyterlab extension to make, download and extract archive files.

archiving jupyterlab jupyterlab-extension jupyterlab-extensions

Last synced: 26 Jul 2025

https://github.com/ropensci/arkdb

Archive and unarchive databases as flat text files

archiving database dbi peer-reviewed r r-package rstats

Last synced: 09 Apr 2025

https://github.com/peterk/warcworker

A dockerized, queued high fidelity web archiver based on Squidwarc

archiving high-fidelity-preservation preservation webarchives webarchiving

Last synced: 12 Apr 2025

https://github.com/grawity/irc-docs

Collected IRC protocol documentation

archiving history irc

Last synced: 07 May 2025

https://github.com/palewire/storytracker

Tools for tracking stories on news homepages

archiving journalism python

Last synced: 27 Jul 2025

https://github.com/Own-Data-Privateer/hoardy-web

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

archive archiver archiving auto-save backups browser-extension cli internet internet-archiving offline-reading self-hosted snapshot wayback-machine web-archive web-archiving web-browsing website-archive

Last synced: 11 Mar 2025

https://github.com/killercup/static-filez

Build compressed archives for static files and serve them over HTTP

archiving http rust static-server

Last synced: 29 Jul 2025

https://github.com/raffomania/archive.observer

🔭 AskHistorians Archive Viewer

archiving pushshift reddit rust static

Last synced: 29 Apr 2025

https://github.com/alex313031/windows-7-stuffz

Files for Windows 7 that are hard to find because Microsoft took them down.

archiving drivers kb nt-kernel windows-7 windows7 windows7-8 windows7-simulator windows7-windows11

Last synced: 20 Jun 2025

https://github.com/evolsoft/phartools

A powerful PHP-CLI tool to manage phar (PHP-Archive) files

archiving packaging phar-files php php-cli

Last synced: 09 Jul 2025

https://github.com/palewire/savemy.news

Save My News: A personal, permanent clipping service

archiving django journalism news python

Last synced: 19 Apr 2025

https://github.com/corentinb/warc

Read and write WARC files in Go

archiving go warc

Last synced: 07 May 2025

https://github.com/peterk/munin-indexer

A social media open post web archiving tool

archiving high-fidelity-preservation preservation webarchiving

Last synced: 12 Apr 2025

https://github.com/ArchiveBox/pocket-exporter

A service to help export your pocket bookmarks, tags, saved article text, and more...

archivebox archiving bookmarks getpocket html internet-archiving pocket urls web-archiving

Last synced: 19 Aug 2025

https://github.com/goldbattle/twitch_vod_creator

Download twitch vods, clips, and render videos with chat.

archiving ffmpeg twitch twitch-chat twitch-vods vod

Last synced: 13 Jun 2025

https://github.com/alopezrivera/anchorage

Save your bookmark collection in the Internet Archive, or locally.

archiving internet-archive permanence web

Last synced: 29 Apr 2025

https://github.com/mhucka/devilfish

A utility for simultaneously creating full-page PDF snapshots and web archives of web pages in DEVONthink Pro.

archiving devonthink pdf web webarchive

Last synced: 24 Feb 2025

https://github.com/ouranosinc/miranda

A modern Python utility library for climate data collection and management

archiving climate collection management netcdf

Last synced: 31 Aug 2025

https://github.com/Ouranosinc/miranda

A modern Python utility library for climate data collection and management

archiving climate collection management netcdf

Last synced: 20 Jul 2025

https://github.com/troglobit/zoo

public domain zoo archive tool

archiver archiving backup compression zoo

Last synced: 18 Mar 2025

https://github.com/dbeley/reddit_export_userdata

Export userdata from your reddit accounts. Submissions, comments, saved, upvoted contents are supported.

archivebox archiving reddit reddit-scraper

Last synced: 29 Apr 2025

https://github.com/shukriadams/browsemonkey

Takes snapshots of file systems for offline browsing and searching.

archiving datahoarder takes-snapshots util

Last synced: 10 May 2025

https://github.com/dirkpetersen/froster

Froster is a user-friendly archiving tool for teams that move data between Posix file systems and S3 like object storage systems such as AWS Glacier

archiving aws boto3 cli duckdb glacier hpc metadata petabyte pwalk python rclone s3 slurm storage tui

Last synced: 21 Sep 2025

https://github.com/funkyfuture/compose-dump

Dump and restore Docker Compose-projects

archiving backup docker-compose

Last synced: 11 Apr 2025

https://github.com/dbeley/archiveboxmatic

ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.

archivebox archiving web-archiving

Last synced: 29 Apr 2025

https://github.com/sagebind/respk

Manage resource files using a fast, custom open format designed especially for use in games.

archiving game-development resource-loader

Last synced: 14 Apr 2025

https://github.com/wabarc/rivet

A toolkit makes it easier to archive webpages to IPFS

archiving ipfs webpage

Last synced: 16 May 2025

https://github.com/caltechlibrary/iga

IGA is the InvenioRDM GitHub Archiver, a standalone program as well as a GitHub Action that lets you automatically archive GitHub software releases in an InvenioRDM repository.

archives archiving automation code-preservation github-action github-actions github-automation invenio invenio-rdm preservation reproducibility reproducible-research research-data-management research-software software-archiving software-preservation source-code-archiving

Last synced: 24 Jun 2025

https://github.com/jirwin/ipfs-archive

Use IPFS to archive a url.

archiving ipfs

Last synced: 11 Oct 2025

https://github.com/vdbsh/backy

tiny multiprocessing utility for file backups

archiving backup bsd bzip2 cli golang linux macos rsync synchronization tar

Last synced: 06 May 2025

https://github.com/indiscipline/rearchiver

Prepare your Reaper project for archiving, converting WAV to FLAC and WAVPACK and changing the RPP file accordingly

archiving compression daw reaper

Last synced: 06 Mar 2025