Projects in Awesome Lists tagged with dedupe
A curated list of projects in awesome lists tagged with dedupe .
https://github.com/restic/restic
Fast, secure, efficient backup program
backup dedupe deduplication go restic secure-by-default
Last synced: 12 May 2025
https://github.com/dedupeio/dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
clustering datamade de-duplicating dedupe dedupe-library entity-resolution python python-library record-linkage
Last synced: 18 Dec 2025
https://github.com/scinos/yarn-deduplicate
Deduplication tool for yarn.lock files
dedupe duplicated-packages duplicates lock-file yarn yarn-lock
Last synced: 11 May 2025
https://github.com/zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
analytics cdp customer-data-platform data-science databricks dataengineering datalake dataquality dedupe deduplication entity-resolution fuzzy-matching fuzzymatch identity-resolution master-data-management masterdata mdm ml snowflake spark
Last synced: 16 Nov 2025
https://github.com/zinggai/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
analytics analytics-engineering data-science data-transformation data-transformations dataengineering datalake dataquality dedupe deduplication entity-resolution etl fuzzy-matching fuzzymatch identity identity-resolution masterdata ml modern-data-stack spark
Last synced: 14 May 2025
https://github.com/j535d165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 14 May 2025
https://github.com/J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 26 Mar 2025
https://github.com/nil0x42/duplicut
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
c cracking dedupe dictionary duplicate-detection hashcat hashes password password-cracking remove-duplicates uniq unique wordlist wordlist-generator wordlists
Last synced: 13 Apr 2025
https://github.com/blakeembrey/free-style
Make CSS easier and more maintainable by using JavaScript
css css-in-js css-string dedupe hash javascript js minification typescript
Last synced: 08 Oct 2025
https://github.com/dedupeio/csvdedupe
:id: Command line tool for deduplicating CSV files
cli csv-files dedupe entity-resolution record-linkage
Last synced: 13 Apr 2025
https://github.com/dedupeio/dedupe-examples
:id: Examples for using the dedupe library
dedupe entity-resolution python record-linkage
Last synced: 18 Dec 2025
https://github.com/knjcode/imgdupes
Identifying and removing near-duplicate images using perceptual hashing.
dedupe deduplicate image perceptual-hashes perceptual-hashing
Last synced: 18 Jan 2026
https://github.com/kornelski/dupe-krill
A fast file deduplicator
dedupe dupes file-deduplication hardlinks macos rust-library
Last synced: 12 Apr 2025
https://laktak.github.io/chkbit/
Check your files for data corruption and run quick file deduplication
backup bitrot-detection btrfs cloud-backup data-degradation data-integrity dedup dedupe deduper deduplication disk-check storage-media
Last synced: 03 Apr 2026
https://github.com/laktak/chkbit
Check your files for data corruption and run quick file deduplication
backup bitrot-detection btrfs cloud-backup data-degradation data-integrity dedup dedupe deduper deduplication disk-check storage-media
Last synced: 04 Apr 2025
https://github.com/jason89521/daxus
Daxus is a server state management library for React that provides full control over data, leading to a better user experience.
cache data dedupe hook react revalidate server-state-management user-experience
Last synced: 23 Jun 2025
https://github.com/zayne-labs/callapi
A lightweight fetching library packed with essential features - retries, interceptors, request deduplication and much more, all while still retaining a similar API surface with regular Fetch.
callapi dedupe fetch fetch-wrapper interceptors params plugins query request-dedupe retries schema standard-schema typesafe validation
Last synced: 27 Apr 2026
https://github.com/jRimbault/yadf
Yet Another Dupes Finder
dedupe deduplication dupes-finder duplicate-detection fdupes file-deduplication
Last synced: 06 Mar 2025
https://github.com/dssg/pgdedupe
A simple command line interface to the datamade/dedupe library.
data-cleaning database dedupe deduplication postgresql python record-linkage
Last synced: 21 Jan 2026
https://github.com/jchristn/watsondedupe
Self-contained C# library for data deduplication using Sqlite
chunk chunk-data chunk-key compress compression data-deduplication dedupe deduplication duplicate-data nuget sqlite-database storage
Last synced: 28 Feb 2026
https://github.com/mighty-justice/django-super-deduper
Utilities for de-duping Django model instances
Last synced: 30 Jul 2025
https://github.com/kevinpollet/pocket-deduper
Remove duplicates from your Pocket list.
cli dedupe duplicates go golang pocket tool
Last synced: 11 Apr 2025
https://github.com/futuresearch/everyrow-sdk
Intelligent pandas dataframe ops: sort, filter, dedupe & join by qualitative criteria
cleaning-data dedupe entity-resolution filtering llm-agents merging-algorithms pandas-dataframe ranking semantic-analysis
Last synced: 20 Feb 2026
https://github.com/dedupeio/dedupe-variable-address
Address Variable Type for dedupe
Last synced: 15 Apr 2025
https://github.com/dedupeio/dedupe-variable-name
name variable type for dedupe
Last synced: 15 Apr 2025
https://github.com/samhirtarif/helper-methods-js
A repo that contains helper methods for common and not-so-common use cases
async dedupe deduplication deepcopy indexesof isasync
Last synced: 08 Mar 2025
https://github.com/mterron/swuniq
A command-line tool for deduplicating entries in a file or stream with constant memory usage
cli dedupe deduping deduplicate deduplication filter sliding-window uniq
Last synced: 22 Feb 2026
https://github.com/harpin-ai/toolkit-examples
Examples for trying out the harpin AI identity resolution and data quality toolkit
data-engineering data-quality dedupe deduplication entity-resolution identity identity-resolution spark
Last synced: 23 Apr 2025
https://github.com/lilydjwg/android-dedupefs
A filesystem for reading Android dedupe backup
Last synced: 09 May 2026
https://github.com/amoghe/dedup
Deduplicator
compression compressor dedupe deduplication
Last synced: 21 Jan 2026
https://github.com/dedupeio/dedupe-variable-fuzzycategory
Dedupe Variable for Fuzzy Categories
Last synced: 27 Aug 2025
https://github.com/dedupeio/parseratorvariable
Base class for dedupe variables for parsed fields
Last synced: 15 Apr 2025
https://github.com/barchart/aws-lambda-suppressor
JavaScript utility for suppressing duplicate AWS Lambda invocations
dedupe deduplication duplicate-detection dynamodb javascript lambda public-repository serverless
Last synced: 23 Jul 2025
https://github.com/betaweb/twicejs
Manage duplicates, count item occurences, dedupe an Array.
array array-manipulations base64 countable counter dedupe duplicate-detection duplicates duplicates-removal javascript js json occurrences
Last synced: 29 Apr 2026
https://github.com/dohliam/sort-columns
Sort, uniq, reverse, and randomize data
alpha-order dedupe duplicate-values duplicates-removed javascript remove-duplicates sort sort-columns sort-words tiny-tools uniq
Last synced: 12 Jul 2025
https://github.com/mattriley/node-duplicate-file-finder
Finds duplicate files across given directories without hashing.
dedupe duplicate-files hashless javascript nodejs npm-package
Last synced: 03 Sep 2025
https://github.com/stdlib-js/iter-dedupe-by
Create an iterator which removes consecutive values that resolve to the same value according to a provided function.
compress dedupe deduplicate deduplication duplicate iterable iterate iteration iterator javascript node node-js nodejs stdlib uniq unique util utilities utility utils
Last synced: 16 Aug 2025
https://github.com/jchristn/watsondedupeui
UI for WatsonDedupe library
compression dedupe deduplication watson-dedupe
Last synced: 31 Mar 2025
https://github.com/octivi/borg-backup-wrapper
Wrapper for a deduplicating archiver BorgBackup. It simplifies performing everyday tasks on multiply repositories.
backup bash borgbackup borgbase compression dedupe deduplication encryption servers
Last synced: 20 Feb 2026
https://github.com/stdlib-js/array-base-to-deduped
Copy elements to a new generic array after removing consecutive duplicated values.
array compress copy data dedupe deduplicate deduplication duplicate generic javascript node node-js nodejs stdlib structure types uniq unique
Last synced: 14 Jun 2025
https://github.com/jaredkoontz/bitwarden-dedup
Filters bitwarden json files to find duplicate entries, and "useless" entries.
bitwarden dedupe deduplication python
Last synced: 13 Oct 2025
https://github.com/soenneker/soenneker.deduplication.slidingwindow.registry
A keyed registry of sliding window dedupe instances.
csharp dedupe deduplication dotnet registry slidingwindow slidingwindowdeduperegistry util
Last synced: 22 Apr 2026
https://github.com/soenneker/soenneker.deduplication.bounded.registry
A keyed registry of bounded dedupe instances.
bounded boundeddeduperegistry csharp dedupe deduplication dotnet registry util
Last synced: 03 May 2026
https://github.com/soenneker/soenneker.sets.concurrent.slidingwindow
A high-throughput, thread-safe set whose bucketed entries automatically expire after a fixed time window.
auto-expire concurrency concurrent csharp de-dupe dedupe dotnet object set sets slidingwindow slidingwindowconcurrentset threadsafe
Last synced: 22 Apr 2026
https://github.com/soenneker/soenneker.deduplication.slidingwindow
High-performance sliding-window deduplication for .NET.
auto-expire concurrency csharp de-dupe dedupe deduplication dotnet object set slidingwindow slidingwindowdedupe threadsafe
Last synced: 24 Apr 2026
https://github.com/soenneker/soenneker.deduplication.bounded
A thread-safe high-performance bounded size deduplication utility for .NET.
bounded boundeddedupe csharp dedupe deduplication dotnet max maxsize object size
Last synced: 03 May 2026
https://github.com/chris-santiago/stringcluster
A Scikit-Learn style deduper.
dedupe deduplication scikit-learn text-processing text-similarity transformer
Last synced: 16 May 2026
https://github.com/evancarroll/ytdl-clean
Dedupe from different versions of YouTube download
cleanup dedupe deduplicate deduplication youtube-dl youtube-download youtube-downloader-python ytdl
Last synced: 10 May 2026
https://github.com/marirs/dedupe_yara_rule-rs
Dedupe yara rules - Rust version
dedupe deduper rust rust-lang yara yara-rules yara-x
Last synced: 23 Apr 2025
https://github.com/sajad-net/dedupe
A lightweight and efficient command-line tool written in Go to help you find and remove duplicate files on your disk.
cli dedupe duplicate-detection go
Last synced: 09 Mar 2025