Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with deduplication

A curated list of projects in awesome lists tagged with deduplication .

https://github.com/restic/restic

Fast, secure, efficient backup program

backup dedupe deduplication go restic secure-by-default

Last synced: 29 Sep 2024

https://github.com/borgbackup/borg

Deduplicating archiver with compression and authenticated encryption.

backup borgbackup c compression cython dedupe deduplication encryption python python-3 ssh

Last synced: 01 Oct 2024

https://github.com/kopia/kopia

Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

backup cloud deduplication encryption google-cloud-storage hacktoberfest

Last synced: 31 Jul 2024

https://github.com/arsenetar/dupeguru

Find duplicate files

deduplication python

Last synced: 31 Jul 2024

https://github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

address address-parser c deduping deduplication international machine-learning natural-language-processing nlp record-linkage

Last synced: 30 Sep 2024

https://github.com/mhx/dwarfs

A fast high compression read-only file system for Linux, Windows and macOS

archiving compression cpp deduplication dwarfs filesystem flac fuse fuse-filesystem gpl-license linux lrzip lzma macfuse macos squashfs windows winfsp zpaq zstd

Last synced: 01 Oct 2024

https://github.com/sahib/rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem

c deduplication duplicates fdupes filesystem lint python

Last synced: 30 Sep 2024

https://github.com/rustic-rs/rustic

rustic - fast, encrypted, and deduplicated backups powered by Rust

backup deduplication encryption restic rust

Last synced: 30 Sep 2024

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 28 Sep 2024

https://github.com/dpc/rdedup

Data deduplication engine, supporting optional compression and public key encryption.

backup data-deduplication deduplication encryption

Last synced: 01 Aug 2024

https://github.com/Yomguithereal/talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Last synced: 31 Jul 2024

https://github.com/sreedevk/deduplicator

Filter, Sort & Delete Duplicate Files Recursively

deduplication duplicate-detection duplicate-files duplicatefilefinder filesystem rust

Last synced: 01 Aug 2024

https://github.com/cargo-limit/cargo-limit

Productivity improvements for Rust ecosystem: warnings are skipped until errors are fixed, LSP-independent Neovim integration, etc.

build cargo cargo-plugin cargo-wrapper cli crates deduplication filter limit neovim neovim-plugin nvim plugin productivity runner rust wrapper

Last synced: 28 Sep 2024

https://github.com/dm-vdo/kvdo

A kernel module which provide a pool of deduplicated and/or compressed block storage.

compression deduplication kernel-modules linux-kernel storage vdo

Last synced: 28 Sep 2024

https://github.com/Jaskey/RocketMQDedupListener

RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用

deduplication rocketmq rocketmq-client

Last synced: 02 Aug 2024

https://github.com/opensanctions/nomenklatura

Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources

data-integration deduplication record-link

Last synced: 01 Aug 2024

https://github.com/kdeldycke/mail-deduplicate

📧 CLI to deduplicate mails from mail boxes.

babyl cleanup cli dedupe deduplication email mail mailbox maildir mbox mh mmdf python

Last synced: 01 Aug 2024

https://github.com/elemental-lf/benji

Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices

b2 backup block-based ceph deduplication iscsi kubernetes lvm s3

Last synced: 01 Aug 2024

https://github.com/nlfiedler/fastcdc-rs

FastCDC implementation in Rust

chunking-algorithm deduplication rust

Last synced: 06 Aug 2024

https://github.com/netinvent/npbackup

A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)

backup cli compression deduplication gui healthcheck monitoring prometheus-metrics restic vss

Last synced: 01 Aug 2024

https://github.com/deajan/backup-bench

Quick and dirty backup tool benchmark with reproducible results

backup benchmark benchmarking borgbackup bupstash compression deduplication duplicacy kopia restic

Last synced: 04 Aug 2024

https://github.com/jvirkki/dupd

CLI utility to find duplicate files

c deduplication duplicate-files duplicatefilefinder duplicates fdupes

Last synced: 31 Jul 2024

https://github.com/OpenGene/gencore

Generate duplex/single consensus reads to reduce sequencing noises and remove duplications

bioinformatics consensus deduplication deep-sequencing duplex duplex-sequencing duplication ngs sequencing sequencing-error sequencing-noise somatic

Last synced: 03 Aug 2024

https://github.com/tsileo/blobstash

You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash

backup blob-store blobstash content-addressed deduplication document-store go storage

Last synced: 01 Aug 2024

https://github.com/lostatc/acid-store

[UNMAINTAINED] A transactional and deduplicating virtual file system

acid deduplication encryption filesystem fuse rclone redis rust s3 sftp sqlite storage

Last synced: 06 Aug 2024

https://github.com/unreadablewxy/fs-curator

Automation for the serious data hoarder that wants to have their data and use it

deduplication directory-tree file-renamer file-sorting hard-links organizer

Last synced: 13 Aug 2024

https://github.com/openvenues/lieu

Dedupe/batch geocode addresses and venues around the world with libpostal

address deduplication geocoding international venues

Last synced: 05 Aug 2024

https://github.com/lobocv/simpleflow

Generic simple workflows and concurrency patterns

batching concurrency counter deduplication generics go golang timeseries worflows workerpool

Last synced: 02 Oct 2024

https://github.com/donatj/imgdedup

CLI tool for image duplicate detection

deduplication image

Last synced: 02 Oct 2024

https://github.com/davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

dataset deduplication fine-tuning fraud gpt2 huggingface lsh minhash nlp pytorch readability scam transformer web-scraping

Last synced: 05 Aug 2024

https://github.com/nebucatnetzer/borg-qt

A Qt frontend for the command line software BorgBackup.

backup borg borgbackup borgbackup-gui deduplication gplv3 pyqt5 python3 qt5

Last synced: 28 Sep 2024

https://github.com/InexplicableMagic/photodedupe

A utility for locating near duplicate photos irrespective of image resolution, compression settings or file format.

computer-vision computer-vision-tools deduplication duplicate-detection image-deduplication

Last synced: 01 Aug 2024

https://github.com/gerald-lnj/duplicate-video-finder

A python module to detect duplicate videos in a directory.

cleanup data-hoarder deduplication duplicate-detection python python-3 video-processing

Last synced: 04 Aug 2024

https://github.com/arbal/brave-control

Control Brave Browser from the command line. List, close, deduplicate and bring focus to open tabs. Also includes Alfred workflow integration.

alfred alfred-workflow automation brave brave-browser browser cli command-line command-line-tool deduplication focus jxa tabs workflow

Last synced: 01 Aug 2024

https://github.com/gblach/reflicate

Deduplicate data by creating reflinks between identical files.

btrfs deduplicate deduplication ocfs2 reflinks rust xfs

Last synced: 31 Jul 2024

https://github.com/homebase/bigpack

Blazing Fast Petabyte Scale Static Web Server + Tools. Serve Billion Files from an Indexed, Compressed and Deduplicated Archive.

archiver deduplication webserver

Last synced: 01 Aug 2024

https://github.com/gkjohnson/webpack-script-guard

Webpack loader for guarding against duplicate scripts in separate bundles

deduplicate deduplication html import imports javascript loader polymer webcomponents webpack

Last synced: 01 Oct 2024