An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with deduplication

A curated list of projects in awesome lists tagged with deduplication .

https://github.com/restic/restic

Fast, secure, efficient backup program

backup dedupe deduplication go restic secure-by-default

Last synced: 23 Apr 2025

https://github.com/borgbackup/borg

Deduplicating archiver with compression and authenticated encryption.

backup borgbackup compression deduplication encryption python ssh

Last synced: 18 Apr 2025

https://github.com/kopia/kopia

Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

backup cloud deduplication encryption google-cloud-storage hacktoberfest

Last synced: 23 Apr 2025

https://github.com/arsenetar/dupeguru

Find duplicate files

deduplication python

Last synced: 08 Apr 2025

https://github.com/openvenues/libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

address address-parser c deduping deduplication international machine-learning natural-language-processing nlp record-linkage

Last synced: 23 Apr 2025

https://github.com/rustic-rs/rustic

rustic - fast, encrypted, and deduplicated backups powered by Rust

backup deduplication encryption hacktoberfest restic rust

Last synced: 23 Apr 2025

https://github.com/mhx/dwarfs

A fast high compression read-only file system for Linux, Windows and macOS

archiving compression cpp deduplication dwarfs filesystem flac fuse fuse-filesystem gpl-license linux lrzip lzma macfuse macos squashfs windows winfsp zpaq zstd

Last synced: 10 Apr 2025

https://github.com/sahib/rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem

c deduplication duplicates fdupes filesystem lint python

Last synced: 10 Apr 2025

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 23 Apr 2025

https://github.com/dpc/rdedup

Data deduplication engine, supporting optional compression and public key encryption.

backup data-deduplication deduplication encryption

Last synced: 14 Apr 2025

https://github.com/yomguithereal/talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Last synced: 14 Apr 2025

https://github.com/Yomguithereal/talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Last synced: 15 Mar 2025

https://github.com/sreedevk/deduplicator

Filter, Sort & Delete Duplicate Files Recursively

deduplication duplicate-detection duplicate-files duplicatefilefinder filesystem rust

Last synced: 07 Apr 2025

https://github.com/cargo-limit/cargo-limit

Productivity improvements for Rust ecosystem: warnings are skipped until errors are fixed, LSP-independent Neovim integration, etc.

build cargo cargo-plugin cargo-wrapper cli crates deduplication filter limit neovim neovim-plugin nvim plugin productivity runner rust wrapper

Last synced: 07 Apr 2025

https://github.com/dm-vdo/kvdo

A kernel module which provide a pool of deduplicated and/or compressed block storage.

compression deduplication kernel-modules linux-kernel storage vdo

Last synced: 12 Apr 2025

https://github.com/Jaskey/RocketMQDedupListener

RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用

deduplication rocketmq rocketmq-client

Last synced: 12 Nov 2024

https://github.com/netinvent/npbackup

A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)

backup cli compression deduplication gui healthcheck monitoring orchestrator prometheus-metrics restic vss

Last synced: 09 Apr 2025

https://github.com/dm-vdo/vdo

Userspace tools for managing VDO volumes.

compression deduplication storage vdo

Last synced: 04 Apr 2025

https://github.com/opensanctions/nomenklatura

Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources

data-integration deduplication record-link

Last synced: 03 Apr 2025

https://github.com/007revad/synology_enable_deduplication

Enable deduplication with non-Synology SSDs and unsupported NAS models

deduplication diskstation dsm rackstation synology synology-disk-station synology-dsm synology-nas

Last synced: 05 Apr 2025

https://github.com/yornaath/batshit

A batch manager that will deduplicate and batch requests for a certain data type made within a window. Useful to batch requests made from multiple react components that uses react-query

async batch-processing concurrency deduplication fetch react react-query tanstack typescript

Last synced: 04 Apr 2025

https://github.com/kdeldycke/mail-deduplicate

📧 CLI to deduplicate mails from mail boxes.

babyl cleanup cli dedupe deduplication email mail mailbox maildir mbox mh mmdf python

Last synced: 05 Apr 2025

https://github.com/vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

approximate-nearest-neighbors data-matching deduplication deep-learning embeddings entity-matching entity-resolution python pytorch record-linkage representation-learning

Last synced: 09 Apr 2025

https://github.com/deajan/backup-bench

Quick and dirty backup tool benchmark with reproducible results

backup benchmark benchmarking borgbackup bupstash compression deduplication duplicacy kopia restic

Last synced: 05 Apr 2025

https://github.com/nlfiedler/fastcdc-rs

FastCDC implementation in Rust

chunking-algorithm deduplication rust

Last synced: 04 Apr 2025

https://github.com/elemental-lf/benji

Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices

b2 backup block-based ceph deduplication iscsi kubernetes lvm s3

Last synced: 06 Apr 2025

https://github.com/laktak/chkbit

Check your files for data corruption and run quick file deduplication

backup bitrot-detection btrfs cloud-backup data-degradation data-integrity dedup dedupe deduper deduplication disk-check storage-media

Last synced: 04 Apr 2025

https://github.com/opengene/gencore

Generate duplex/single consensus reads to reduce sequencing noises and remove duplications

bioinformatics consensus deduplication deep-sequencing duplex duplex-sequencing duplication ngs sequencing sequencing-error sequencing-noise somatic

Last synced: 10 Apr 2025

https://github.com/OpenGene/gencore

Generate duplex/single consensus reads to reduce sequencing noises and remove duplications

bioinformatics consensus deduplication deep-sequencing duplex duplex-sequencing duplication ngs sequencing sequencing-error sequencing-noise somatic

Last synced: 16 Nov 2024

https://github.com/jvirkki/dupd

CLI utility to find duplicate files

c deduplication duplicate-files duplicatefilefinder duplicates fdupes

Last synced: 21 Mar 2025

https://github.com/tsileo/blobstash

You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash

backup blob-store blobstash content-addressed deduplication document-store go storage

Last synced: 17 Mar 2025

https://github.com/unreadablewxy/fs-curator

Automation for the serious data hoarder that wants to have their data and use it

deduplication directory-tree file-renamer file-sorting hard-links organizer

Last synced: 04 Dec 2024

https://github.com/lostatc/acid-store

[UNMAINTAINED] A transactional and deduplicating virtual file system

acid deduplication encryption filesystem fuse rclone redis rust s3 sftp sqlite storage

Last synced: 24 Nov 2024

https://github.com/openvenues/lieu

Dedupe/batch geocode addresses and venues around the world with libpostal

address deduplication geocoding international venues

Last synced: 19 Dec 2024

https://github.com/daniel-liu-c0deb0t/umicollapse

Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.

data-structures deduplication fastq hamming java string-search string-similarity umis unique-molecular-identifiers

Last synced: 13 Apr 2025

https://github.com/ronomon/deduplication

Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.

chunking content-dependent deduplication nodejs

Last synced: 17 Dec 2024

https://github.com/PJDude/dude

Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.

cli deduplication duplicate duplicate-detection duplicate-files duplicates duplicates-removal easy easy-to-use easyui gui gui-application python python3 sha1 threads tkinter utility utility-application

Last synced: 06 Mar 2025

https://github.com/lobocv/simpleflow

Generic simple workflows and concurrency patterns

batching concurrency counter deduplication generics go golang timeseries worflows workerpool

Last synced: 23 Apr 2025

https://github.com/ing-bank/spark-matcher

Record matching and entity resolution at scale in Spark

deduplication entity-resolution record-linkage spark

Last synced: 14 Apr 2025

https://github.com/donatj/imgdedup

CLI tool for image duplicate detection

deduplication image

Last synced: 14 Apr 2025

https://github.com/samber/go-singleflightx

🧬 x/sync/singleflight but with generics, batching, sharding and nullable result

cache channel concurrent deduplication generics go in-flight singleflight sync

Last synced: 22 Apr 2025

https://github.com/sergey-dryabzhinsky/dedupsqlfs

Deduplicating filesystem via Python3, FUSE and SQLite

backup compression deduplication fuse python python3 storage

Last synced: 31 Jan 2025

https://github.com/shivam5992/dupandas

:bar_chart: python package for performing deduplication using flexible text matching and cleaning in pandas dataframe

deduplication flexible-matching pandas python text-cleaner

Last synced: 14 Apr 2025

https://github.com/immobiliare/ufoid

Ultra Fast Optimized Image Deduplication.

automation computer-vision deduplication images immobiliare-labs python

Last synced: 23 Apr 2025

https://github.com/davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

dataset deduplication fine-tuning fraud gpt2 huggingface lsh minhash nlp pytorch readability scam transformer web-scraping

Last synced: 22 Nov 2024

https://github.com/j535d165/febrl-fork-v0.4.2

Fork of the Freely Extensible Biomedical Record Linkage program

deduplication entity-resolution matching python-library record-linkage

Last synced: 22 Nov 2024

https://github.com/InexplicableMagic/photodedupe

A utility for locating near duplicate photos irrespective of image resolution, compression settings or file format.

computer-vision computer-vision-tools deduplication duplicate-detection image-deduplication

Last synced: 07 Apr 2025

https://github.com/lkarlslund/stringdedup

String deduplication package for Go

dedup deduplication golang string xxhash

Last synced: 22 Apr 2025

https://github.com/nebucatnetzer/borg-qt

A Qt frontend for the command line software BorgBackup.

backup borg borgbackup borgbackup-gui deduplication gplv3 pyqt5 python3 qt5

Last synced: 22 Jan 2025

https://github.com/marcnuth/deduplication

Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.

algorithms cv deduplication google imagehash shingling simhash

Last synced: 19 Nov 2024

https://github.com/ragibson/sms-mms-deduplication

Tool to remove duplicate text messages (SMS/MMS/RCS). RCS support is available for some clients.

deduplication mms rcs sms text-message

Last synced: 22 Apr 2025

https://github.com/nickcrews/mismo

The SQL/Ibis powered sklearn of record linkage

deduplication duckdb entity-resolution ibis python record-linkage sql

Last synced: 18 Nov 2024

https://github.com/deadsoul/dugu

Find, remove and avoid duplicates with dugu: The Duplicates Guru

deduplication dugu duplicate-detection duplicate-files duplicatefilefinder duplicates duplicates-guru python

Last synced: 05 Apr 2025

https://github.com/juntaki/bucketsync

S3 backed FUSE Filesystem written in Go with dedup and encryption.

deduplication filesystem fuse golang s3

Last synced: 14 Apr 2025

https://github.com/junkurihara/rust-gd

An Implementation of Generalized Deduplication, written in Rust

deduplication error-correcting-codes generalized-deduplication hamming-codes reed-solomon-codes rust

Last synced: 10 Apr 2025

https://github.com/gamemann/linux-btrfs-lab

A small lab using Ubuntu 23.04 with the BTRFS file system to test deduplication feature.

23-04 btrfs dd deduplication disk disk-space documentation duperemove filesystem hard-drive kvm lab linux qemu save-space ssd ubuntu vm

Last synced: 18 Mar 2025

https://github.com/glau-bd/duplicate-video-finder

A python module to detect duplicate videos in a directory.

cleanup data-hoarder deduplication duplicate-detection python python-3 video-processing

Last synced: 21 Jan 2025

https://github.com/opengene/dedup

Deduplication for cfDNA sequencing data

bioinformatics ctdna deduplication liquid ngs

Last synced: 10 Apr 2025

https://github.com/infinisil/soph

Efficiently import pictures while handling duplicates gracefully

blockhash deduplication haskell perceptual-hashing pictures-organizer similarity-search

Last synced: 22 Mar 2025

https://github.com/gerald-lnj/duplicate-video-finder

A python module to detect duplicate videos in a directory.

cleanup data-hoarder deduplication duplicate-detection python python-3 video-processing

Last synced: 20 Nov 2024

https://github.com/andrewdalpino/dataloader-php

A speed layer that enables query batching, de-duplication, and caching for efficient data fetching over any storage backend.

buffer cache dataloader deduplication graphql optimization php storage

Last synced: 10 Apr 2025

https://github.com/mk-fg/lafs-backup-tool

Tool to securely push incremental (think "rsync --link-dest") backups to tahoe-lafs

automation backup compression deduplication python tahoe-lafs twisted yaml

Last synced: 23 Apr 2025

https://github.com/dsacms/deduplifhir

Prototype for basic deduplication and aggregation of eCQM data

ai cmsoss-tier3 data-science deduplication electron government healthcare poetry python

Last synced: 13 Apr 2025

https://github.com/glehmann/hld

Hard Link Deduplicator

dedup deduplication hardlinks reflinks rust

Last synced: 16 Mar 2025

https://github.com/arbal/brave-control

Control Brave Browser from the command line. List, close, deduplicate and bring focus to open tabs. Also includes Alfred workflow integration.

alfred alfred-workflow automation brave brave-browser browser cli command-line command-line-tool deduplication focus jxa tabs workflow

Last synced: 06 Apr 2025

https://github.com/yaroslaff/hashget

Deduplication/backup tool with extremely high 'compression' rate

archive backup compression deduplicate deduplication restic

Last synced: 13 Apr 2025

https://github.com/b0ch3nski/backup-toolkit

Collection of scripts for various backup scenarios.

backup bup compression deduplication logical-volumes lvm recovery restore snapshot

Last synced: 06 Apr 2025

https://github.com/pastelsky/throttle-queue

A promise based priority queue with task deduplication, concurrency control, serial resolution and aging

concurrency deduplication promises queue

Last synced: 11 Nov 2024

https://github.com/cybershadow/ripfs

Simple deduplicating userspace filesystem for recordings of Internet radio stations.

deduplication fuse-filesystem internet-radio

Last synced: 17 Mar 2025

https://github.com/samhirtarif/helper-methods-js

A repo that contains helper methods for common and not-so-common use cases

async dedupe deduplication deepcopy indexesof isasync

Last synced: 08 Mar 2025

https://github.com/innovatrics/dedubcheck

dedubcheck - De-Duplicate Dependency Checker for Node.js monorepos

deduplication duplicates duplicity javascript nodejs nodejs-modules

Last synced: 13 Apr 2025

https://github.com/naiquevin/dupenukem

A command line file deduplication tool

cli deduplication filesystem

Last synced: 11 Apr 2025

https://github.com/gblach/reflicate

Deduplicate data by creating reflinks between identical files.

btrfs deduplicate deduplication ocfs2 reflinks rust xfs

Last synced: 26 Mar 2025

https://github.com/brendon1555/panda-cx-deduplicator

A drop in replacement for the PandaCSS `cx` function with deduplication of atomic classes

classname css deduplication hacktoberfest pandacss styling

Last synced: 13 Feb 2025

https://github.com/fgregg/smered

Mirror of https://bitbucket.org/resteorts/smered

deduplication entity-resolution record-linkage

Last synced: 14 Apr 2025

https://github.com/aiursoftweb/nibot

A cli tool helps you to de-duplicate images in a folder.

deduplication dotnet image-processing tool

Last synced: 13 Apr 2025

https://github.com/yybit/zchunk-rs

A pure rust library for parsing and generating zchunk file

chunk compression deduplication sync

Last synced: 08 Apr 2025