Projects in Awesome Lists tagged with duplicate-detection
A curated list of projects in awesome lists tagged with duplicate-detection .
https://github.com/nomic-ai/nomic
Interact, analyze and structure massive text, image, embedding, audio and video datasets
clustering duplicate-detection embeddings python text topic-modeling unstructured-data
Last synced: 13 May 2025
https://github.com/nil0x42/duplicut
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
c cracking dedupe dictionary duplicate-detection hashcat hashes password password-cracking remove-duplicates uniq unique wordlist wordlist-generator wordlists
Last synced: 13 Apr 2025
https://github.com/windirstat/windirstat
WinDirStat is a disk usage statistics viewer and cleanup tool for Microsoft Windows
cleanup disk-space-analyzer disk-usage-analyzer duplicate-detection treemap treemaps windows
Last synced: 07 Jan 2026
https://github.com/sreedevk/deduplicator
Filter, Sort & Delete Duplicate Files Recursively
deduplication duplicate-detection duplicate-files duplicatefilefinder filesystem rust
Last synced: 21 Jun 2025
https://github.com/akamhy/videohash
Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
duplicate-detection duplicate-video-finder duplicate-videos ffmpeg find-similar-videos-by-content ndvd ndvr near-duplicate-video near-duplicate-video-clip-detection python video video-deduplication video-similarity-search visual-claim
Last synced: 16 May 2025
https://github.com/chenglongma/zoplicate
A plugin that does one thing only: Detect and manage duplicate items in Zotero.
duplicate-detection zotero zotero-addon zotero-plugin zotero6 zotero7
Last synced: 06 Apr 2025
https://github.com/ChenglongMa/zoplicate
A plugin that does one thing only: Detect and manage duplicate items in Zotero.
duplicate-detection zotero zotero-addon zotero-plugin zotero6 zotero7
Last synced: 06 Mar 2025
https://github.com/cryogenicplanet/depp
⚡ Check your npm modules for unused and duplicate dependencies fast
dependency duplicate-detection modules monorepo npm unused
Last synced: 23 Oct 2025
https://github.com/jorensix/panako
The Panako acoustic fingerprinting system.
acoustic-fingerprinting audio-processing duplicate-detection music-information-retrieval
Last synced: 01 Aug 2025
https://github.com/RazgrizHsu/immich-mediakit
An extension toolkit for Immich enabling advanced management capabilities through AI-powered similarity detection
duplicate duplicate-detection immich similarity
Last synced: 14 Jul 2025
https://github.com/kristiankoskimaki/vidupe
Vidupe is a program that can find duplicate and similar video files. V1.211 released on 2019-09-18, Windows exe here:
duplicate-detection duplicate-videos duplicates videos
Last synced: 10 Apr 2025
https://github.com/markusressel/py-image-dedup
CLI utility to find near duplicate images and remove all but the best copy.
dedup deduplication duplicate-detection duplicate-images find-duplicates hacktoberfest image-analysis image-comparison python python-3 python3
Last synced: 07 Apr 2025
https://github.com/unmade/audiomatch
Find similar audio files easily
audio-fingerprinting chromaprint command-line cython duplicate-detection python sound-analysis
Last synced: 12 Apr 2025
https://github.com/umbertogriffo/fast-near-duplicate-image-search
Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.
computer-vision duplicate-detection duplicates-removed hashing image-deduplication image-processing kd-tree kdtree near-duplicate nearest-neighbor-search nearest-neighbors perceptual-hashing phash t-sne
Last synced: 06 Mar 2025
https://github.com/scrubbbbs/cbird
Command-line program for Content-Based Image Retrieval of images and videos. Includes tools for general search and de-duplication.
command-line-interface computer-vision content-based-image-retrieval duplicate-detection duplicate-files duplicates ffmpeg opencv qt6 similarity-search
Last synced: 06 Apr 2025
https://github.com/itwillwork/ostap
CLI tool that fast checks if your bundle contains multiple versions of the same package, only by looking in package.json.
bundle cli-app duplicate-detection frontend webpack
Last synced: 13 Apr 2025
https://github.com/logpai/bughub
A collection of free-text bug reports for duplicate issue identification
bug-reports datasets duplicate-detection nlp
Last synced: 04 Jan 2026
https://github.com/cloud-py-api/mediadc
Nextcloud Media Duplicate Collector application
collector duplicate-detection duplicates media mediadc nextcloud nextcloud-apps nextcloud-vue-app open-source php python python3 single-page-app vue
Last synced: 06 Apr 2025
https://github.com/eyalroz/removedupes
Remove Duplicate Messages
cleaner cleanup duplicate-detection duplicates email email-parsing mail-client mail-folders mozilla productivity thunderbird thunderbird-addon thunderbird-extension
Last synced: 27 Mar 2025
https://github.com/vuolter/deplicate
Advanced Duplicate File Finder for Python
deplicate duplicate duplicate-detection duplicate-files duplicatefilefinder duplicates duplicates-removed duplication-finder finder macosx multi-filtering purge-duplicate-files pypi python scanning unix windows
Last synced: 24 Apr 2025
https://github.com/deplicate/deplicate
Advanced Duplicate File Finder for Python
deplicate duplicate duplicate-detection duplicate-files duplicatefilefinder duplicates duplicates-removed duplication-finder finder macosx multi-filtering purge-duplicate-files pypi python scanning unix windows
Last synced: 30 Apr 2025
https://github.com/marius-sucan/Quick-Picto-Viewer
A uniquely crafted image viewer and editor with options to organize files, and maintain large lists of image files for slideshows, dupes detection or other purposes.
dupes-finder duplicate-detection fileorganizer files-management image image-edi image-manipulation image-organizer image-processing image-viewer imageeditor imageprocessing organizer paint paint-application slideshow slideshow-maker
Last synced: 21 Mar 2025
https://github.com/PJDude/dude
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
cli deduplication duplicate duplicate-detection duplicate-files duplicates duplicates-removal easy easy-to-use easyui gui gui-application python python3 sha1 threads tkinter utility utility-application
Last synced: 06 Mar 2025
https://github.com/src-d/gemini
Advanced similarity and duplicate source code at scale.
duplicate-detection duplicates hash source-code-analysis spark
Last synced: 05 May 2025
https://github.com/jRimbault/yadf
Yet Another Dupes Finder
dedupe deduplication dupes-finder duplicate-detection fdupes file-deduplication
Last synced: 06 Mar 2025
https://github.com/twpayne/find-duplicates
Find duplicate files quickly.
duplicate-detection duplicate-files duplicates find
Last synced: 17 Mar 2025
https://github.com/src-d/apollo
Advanced similarity and duplicate source code proof of concept for our research efforts.
duplicate-detection duplicates python similarity similarity-search source-code
Last synced: 05 May 2025
https://github.com/fffaraz/qthashsum
File Checksum Integrity Verifier & Duplicate File Finder written in C++ Qt
c-plus-plus checksum checksum-integrity-verifier cryptographic-hash duplicate-detection duplicate-files duplicatefilefinder filesystem-indexer hash hashing integrity md5 md5sum qt sha1 sha1sum sha3
Last synced: 10 Apr 2025
https://github.com/appzcoder/phpcloc
:rocket: Cloc & duplicate code checker tool
cloc code-duplication console duplicate-detection php
Last synced: 19 Jul 2025
https://github.com/asadiahmad/detect-duplicated-questions
Detect Duplicated StackOverFlow Questions
dot-product duplicate-detection nlp
Last synced: 14 Apr 2025
https://github.com/mrinjamul/go-dupfinder
Duplicate File Finder.
cleaner duplicate-detection utility
Last synced: 14 Sep 2025
https://github.com/akcarsten/duplicate-finder
This Python packages identifies duplicate files in a folder of interest.
Last synced: 04 Mar 2025
https://github.com/justinshenk/simages
Find duplicates and similar images in a folder
autoencoder duplicate-detection images preprocessing similarity-detection
Last synced: 11 Oct 2025
https://github.com/arikw/outlook-duplicated-items-remover
A VBA script that finds and moves duplicated items in selected outlook folders
duplicate-detection duplicates outlook vba vba-script vba-snippets
Last synced: 11 Sep 2025
https://github.com/bakdata/dedupe
Java DSL for (online) deduplication
data-cleaning data-cleansing deduplication duplicate-detection duplicate-removal
Last synced: 10 Apr 2025
https://github.com/InexplicableMagic/photodedupe
A utility for locating near duplicate photos irrespective of image resolution, compression settings or file format.
computer-vision computer-vision-tools deduplication duplicate-detection image-deduplication
Last synced: 07 Apr 2025
https://github.com/nicolasbizzozzero/dupe_eraser
A command-line tool which automate the deletion of duplicate files based on their hash or perceptual-hash.
cli duplicate-detection duplicate-files duplicates file-management
Last synced: 12 Apr 2025
https://github.com/deadsoul/dugu
Find, remove and avoid duplicates with dugu: The Duplicates Guru
deduplication dugu duplicate-detection duplicate-files duplicatefilefinder duplicates duplicates-guru python
Last synced: 05 Apr 2025
https://github.com/glau-bd/duplicate-video-finder
A python module to detect duplicate videos in a directory.
cleanup data-hoarder deduplication duplicate-detection python python-3 video-processing
Last synced: 02 Oct 2025
https://github.com/sameera-madushan/findm
Findm is a python script to find duplicate file copies in a given directory.
duplicate-detection duplicate-files duplicatefilefinder file-hashing python
Last synced: 18 Jul 2025
https://github.com/transitive-bullshit/phash-gif
Perceptual GIF hashing for easily finding near-duplicate GIFs.
duplicate-detection gif gif-animation perceptual-hashing phash
Last synced: 13 Oct 2025
https://github.com/raspi/samanlainen
Delete duplicate files
duplicate-detection duplicate-files duplicates files rust
Last synced: 15 Apr 2025
https://github.com/tasleson/duplihere
Copy & Paste finder for structured text files.
clones-detection code-quality copy-paste cpd detect-duplications detector developer-tools duplicate-detection duplicates duplications quality research rust
Last synced: 22 Aug 2025
https://github.com/daxcay/imageduplicatefinder
Python application using ai to find duplicate images
ai duplicate-detection image-processing python standalone
Last synced: 04 Apr 2025
https://github.com/isayakhov/duplicate-stickers-remover-bot
Bot can find and remove duplicates stickers from different sticker sets
duplicate-detection python telegram telegram-bot
Last synced: 09 Jul 2025
https://github.com/elcorto/findsame
Find duplicate files and directories based on file hashes.
duplicate-detection duplicate-files duplicatefilefinder file-hashing merkletree multiprocessing multithreading python
Last synced: 06 Sep 2025
https://github.com/tttapa/duplicate-file-finder
List all duplicate files in a directory.
cleanup duplicate-detection file-manager utility
Last synced: 14 Jun 2025
https://github.com/ricopella/cratecleaner
Make your library clutter-free! This Electron app cleans up your digital music and image collections. Unique for DJs: identifies which songs are in crates while detecting duplicates. Digs into metadata for smart cleanup.
audio dedupe-library duplicate-detection duplicatefilefinder electron image prisma react serato tailwind tanstack-react-query typescript vite zod
Last synced: 20 Oct 2025
https://github.com/jkomieter/smartshreds
SmartShreds uses Rust, hashing algorithms, and NLP to detect and manage duplicate files efficiently, optimizing storage and organization with AI-powered tools.
ai desktop-application duplicate-detection file fileorganizer filesystem gtk4 open-source rust storage-management systems-programming
Last synced: 13 May 2025
https://github.com/thomasleplus/kml-utils
Utilities for KML files
distance distance-calculation duplicate-detection google-maps kml kml-data kml-files kml-parser perl utilities utils
Last synced: 14 Apr 2025
https://github.com/keyweeusr/bear
:bear: The decluttering deduplicator
cli-app clutterremoval duplicate-detection duplicates python
Last synced: 25 Jul 2025
https://github.com/erikreed/pydupes
A duplicate file finder like rdfind/fdupes et al that may be faster in environments with millions of files and terabytes of data or over high latency filesystems (e.g. NFS).
duplicate-detection duplication files
Last synced: 06 Mar 2025
https://github.com/gyanbardhan/duplicatequestiondetection
Developed and Deployed NLP Models Achieving Up to 89.89% Accuracy in Detecting Duplicate Question pairs using Transformer https://huggingface.co/spaces/gyanbardhan123/Duplicate_Question_Detection https://drive.google.com/file/d/1MsBA45Hob56OWPuLVCgG3F3QdCZgBq9a/view?usp=sharing
bert bow distilbert duplicate-detection duplicate-questions-identification feature-engineering google huggingface kaggle nlp nlp-machine-learning quora quora-question-pairs spaces text-processing tf-idf transformer
Last synced: 12 Jul 2025
https://github.com/opencoff/go-progs
useful golang utilities for Unixish environments
disk-utilization duplicate-detection duplicate-files file-hash-checker file-hash-generator golang-application golang-cli golang-tools hexlify parallel-directory-walk symlink-management
Last synced: 24 Aug 2025
https://github.com/ajmalshahabudeen/Bitwarden-Duplicate-remover
When Importing multiple CSV files Bitwarden creates Duplicate Entries. So this Python script will remove duplicate entries and keep ONE.
bitwarden bitwarden-password-vault duplicate-detection duplicates duplicates-removal python
Last synced: 27 Mar 2025
https://github.com/dnth/mafat-fastdup-blogpost
Data insights from the MAFAT Satellite Vision challenge.
clustering computer-vision data data-visualization dataset duplicate-detection mafat-radar-challenge validation vision
Last synced: 27 Mar 2025
https://github.com/barchart/aws-lambda-suppressor
JavaScript utility for suppressing duplicate AWS Lambda invocations
dedupe deduplication duplicate-detection dynamodb javascript lambda public-repository serverless
Last synced: 23 Jul 2025
https://github.com/ajmalshahabudeen/bitwarden-duplicate-remover
When Importing multiple CSV files Bitwarden creates Duplicate Entries. So this Python script will remove duplicate entries and keep ONE.
bitwarden bitwarden-password-vault duplicate-detection duplicates duplicates-removal python
Last synced: 10 Jul 2025
https://github.com/dnth/fastdup-manage-clean-curate-blogpost
Find duplicate and anomalies in your dataset. Identify wrong/confusing labels in your dataset. Uncover data leak in your dataset.
anomaly-detection computer-vision data-science data-validation duplicate-detection python
Last synced: 27 Mar 2025
https://github.com/victorqribeiro/dtf
DTF - Duplicate Thumbnail Files - A method to identify duplicate files.
duplicate-detection duplicate-files systematic-mapping systematic-reviews
Last synced: 26 Jul 2025
https://github.com/dnth/clean-up-digital-life-fastdup-blogpost
Eliminate duplicates, blurry, dark, bright and broken images with fastdup and Python.
anomaly-detection clustering computer-vision data-science data-validation declutter duplicate-detection google-images organize-photos photography python
Last synced: 27 Mar 2025
https://github.com/harperreed/image-dupes
A tool for scanning directories, identifying duplicate or similar images via hashing, and generating an HTML report for easy review.
duplicate-detection hashing phash photos
Last synced: 29 Jul 2025
https://github.com/arasgungore/job-posting-duplicate-detection
A project aiming to leverage text embeddings and Milvus, a high-performance vector search engine, to detect duplicate job postings.
data-science docker-compose dockerfile duplicate-detection duplicates embedding embeddings exploratory-data-analysis job-posting job-postings machine-learning milvus natural-language-processing sentence-embedding sentence-embeddings sentence-encoder sentence-encoding sentence-transformers text-embedding vector-search-engine
Last synced: 09 Mar 2025
https://github.com/lvntky/noditto
Noditto: AST Based Code Duplication Finder
abstract-syntax-tree duplicate-detection duplicate-files parser
Last synced: 03 Apr 2025
https://github.com/mrxiaom/banclickwhenusingitem
Minecraft Trident dupe bug fixer | 修复网络包状态不同步导致刷三叉戟漏洞
bugfix duplicate-detection exploit minecraft paper-plugin trident
Last synced: 07 May 2025
https://github.com/jsuyog2/duplicate-finder
A Python application for detecting and managing duplicate images and videos in a specified folder. Features include a user-friendly GUI built with PySimpleGUI, real-time progress updates, and automatic moving of duplicates to organized directories. Utilizes the difPy library for image comparisons and a custom video comparison class.
automation difpy duplicate-detection file-management filesystem-operations gui image-processing progress-bar pysimplegui python video-processing
Last synced: 21 Sep 2025
https://github.com/pouyakary/dup
a tiny and fast command line utility to find the duplicate files within a directory
cli cmd duplicate-detection duplicate-files duplicates filesystem gnu-utilities utility
Last synced: 14 May 2025
https://github.com/betaweb/twicejs
Manage duplicates, count item occurences, dedupe an Array.
array array-manipulations base64 countable counter dedupe duplicate-detection duplicates duplicates-removal javascript js json occurrences
Last synced: 29 Dec 2025
https://github.com/razum2um/xxhashdir_comm
🏭 identifies common or duplicates across different hosts
difference-detection duplicate-detection xxhash xxhashdir
Last synced: 27 Oct 2025
https://github.com/sergio0694/clup
A no-nonsense .NET Core 2.1 CLI duplicate files remover
cli cli-app dotnet dotnet-tool dotnetcore duplicate-detection duplicate-files duplicates-removed netcoreapp
Last synced: 03 Aug 2025
https://github.com/fabricesalvaire/filewalker
A Python library to scan a file system, find duplicated file etc.
duplicate-detection duplicate-files python-library python3
Last synced: 16 Jun 2025
https://github.com/whoswhip/file-manager
A cool little c# application that lets you rename files, detect duplicate files, use multiple gallery-dl instances at once, send all files in a directory that are =< 25MB to a discord webhook, and generate a secure password or username!
discord-webhook duplicate-detection duplicate-removal duplicatefilefinder file-manager file-renamer file-renaming filemanager files gallery-dl password-generator username-generator webhook
Last synced: 24 Aug 2025
https://github.com/tyler-tee/file-deduplicator
Python app built to scan a directory, check for duplicate files, and send them to the trash.
duplicate duplicate-detection duplicate-files pysimplegui python
Last synced: 23 Feb 2025
https://github.com/dahead/dupefiles2
DUPEFILES2 helps you find duplicates files on your systems.
cli csharp dotnet-core duplicate-detection linux spectre-console
Last synced: 29 Dec 2025
https://github.com/onlyuser/rm-dup
rm-dup is a script to remove duplicate files
disk-space disk-usage duplicate-detection duplicate-files duplicatefilefinder duplicates-removed
Last synced: 05 May 2025
https://github.com/junsious/dupfinder
A simple desktop application to search for duplicate files in a specified directory. This application uses SHA-256 hashing to identify duplicates and provides a user-friendly interface with progress tracking.
duplicate-detection duplicate-files files filesfinder rust
Last synced: 11 Jul 2025
https://github.com/eddie4k-code/kafka-connect-deduplicator
A Kafka Connect Single Message Transformation that will avoid duplicate messages being delivered.
apache-kafka duplicate-detection kafka kafka-connect kafka-connect-transformations kafka-connect-transforms single-message-transforms smt
Last synced: 13 May 2025
https://github.com/h1me01/duplicate-images-finder
duplicate-detection image-processing
Last synced: 24 Feb 2025
https://github.com/dpoetzsch/photo-tools
A collection of scripts to manage photos, especially to find duplicates and visually similar images.
duplicate-detection photos visual-similarity visual-similarity-search
Last synced: 07 Apr 2025
https://github.com/exitare/duplicateimagefinder
A tool to find duplicate images for given paths
duplicate-detection duplicates filemanagement images python python3
Last synced: 13 Aug 2025
https://github.com/francois-le-ko4la/duplicate-file-finder
A duplicate file finder.
duplicate-detection duplicate-files python python3
Last synced: 23 Mar 2025
https://github.com/webprofusion/duplicatefilechecker
Windows app to scan two folders and produce CSV list of suspected duplicates, optionally using file content hash
csharp duplicate-detection windows
Last synced: 04 Mar 2025
https://github.com/busterc/similars
:dancers: Find similar objects and partial duplicates in collections
arrays collections duplicate-detection duplicates similar-objects similarity-search
Last synced: 16 Jun 2025
https://github.com/dahead/dupefiles
Dupe Files scans your disks for duplicate files.
csharp deduplication dotnet-core duplicate-detection duplicate-files duplicatefilefinder
Last synced: 27 Dec 2025
https://github.com/davdiv/hashfolder
Simple command line tool that can create/update an sqlite database that contains the hash (by default SHA256) of all files inside a specified root folder.
checksum duplicate-detection duplicate-files duplicates sha256
Last synced: 16 May 2025
https://github.com/jempe/gitlfslite
GitLFSLite A lightweight tool for managing large files in Git repositories by using metadata text files and rsync to simplify synchronization, offering a practical alternative to Git LFS and Git Annex.
duplicate-detection git-backup large-files
Last synced: 16 May 2025
https://github.com/amnuts/duplicate-hunter
Hunt down duplicate files on your computer
duplicate-detection golang hacktoberfest reactjs wails2
Last synced: 05 Oct 2025
https://github.com/Trophonix/RemoveDuplicates
Simple java program to delete all files in the working directory.
duplicate-detection duplicate-files duplicatefilefinder files gpl gplv3 io java
Last synced: 10 Mar 2025
https://github.com/oscarsun72/delete_duplicate_files_from_the_source_directory
檔案總管汰重-WindowsFormsApplication1 delete duplicate files from the source directory
duplicate duplicate-detection duplicate-files duplicates explorer explorer-filemanager file file-manager filemanagement filemanager filemanager-ui files filesystem
Last synced: 16 Mar 2025
https://github.com/itsayellow/finddup
Find duplicate files or directories in a list of paths.
duplicate-detection duplicate-files files
Last synced: 20 Jun 2025
https://github.com/nemat-al/advance-natural-language-processing
Tasks for Advance Natural Language Processing Course @ ITMO University
attention-mechanism bert duplicate-detection fine-tuning natural-language-processing nlp topic-modeling
Last synced: 16 Mar 2025
https://github.com/apaz-cli/ml-imagehash
A PyTorch implementation of a machine learning perceptual image hash algorithm for near-duplicate detection and fast content-based image retrieval.
duplicate-detection image-processing imagehash near-duplicates perceptual-hashing
Last synced: 27 Dec 2025
https://github.com/luis-varona/shadowseek
A CLI tool for near-duplicate detection in text files, written in Rust with no dependencies on runtime environments.
duplicate-detection minhash near-duplicate-detection simhash text-classification
Last synced: 25 Jul 2025
https://github.com/aniruddhakhedkar/eda_to_evaluate_bank_telemarketing_campaign_for_revenue_enhancement
Exploratory_Data_Analysis_Python_Project_2
datavisualization duplicate-detection imputation-methods numpy outlier-removal pandas seaborn statistical-analysis
Last synced: 12 Apr 2025
https://github.com/gechandesu/fdup
File duplicates finder
duplicate-detection duplicate-files vlang
Last synced: 12 Dec 2025
https://github.com/arthurmor4is/duplicate-logic-detector-action
🔍 Automatically detect duplicate logic in Python code changes using advanced AST analysis and semantic similarity. Prevent code duplication and improve code quality.
ast-analysis code-quality code-review duplicate-detection github-actions python static-analysis
Last synced: 08 Oct 2025
https://github.com/lazycatcoder/common-python
Implementing various tasks using Python
bacon bacon-cipher banned-words cesar cesar-cipher change common duplicate-detection duplicates fibbonacci lucky-tickets morse-code morze python ticket vigenere-cipher vignere
Last synced: 25 Feb 2025
https://github.com/giosali/dupeutil
A command-line program written in Python for detecting and removing duplicate files.
command-line-tool duplicate-detection duplicate-files python
Last synced: 17 Jun 2025
https://github.com/jempe/shasums_duplicates
Shasums Duplicates A Bash and Golang utility for detecting and managing duplicate files by generating, comparing, and processing sorted hash lists.
duplicate-detection shell-script-generator
Last synced: 16 May 2025