An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with duplicate-detection

A curated list of projects in awesome lists tagged with duplicate-detection .

https://github.com/nomic-ai/nomic

Interact, analyze and structure massive text, image, embedding, audio and video datasets

clustering duplicate-detection embeddings python text topic-modeling unstructured-data

Last synced: 13 May 2025

https://github.com/nil0x42/duplicut

Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)

c cracking dedupe dictionary duplicate-detection hashcat hashes password password-cracking remove-duplicates uniq unique wordlist wordlist-generator wordlists

Last synced: 13 Apr 2025

https://github.com/windirstat/windirstat

WinDirStat is a disk usage statistics viewer and cleanup tool for Microsoft Windows

cleanup disk-space-analyzer disk-usage-analyzer duplicate-detection treemap treemaps windows

Last synced: 07 Jan 2026

https://github.com/sreedevk/deduplicator

Filter, Sort & Delete Duplicate Files Recursively

deduplication duplicate-detection duplicate-files duplicatefilefinder filesystem rust

Last synced: 21 Jun 2025

https://github.com/chenglongma/zoplicate

A plugin that does one thing only: Detect and manage duplicate items in Zotero.

duplicate-detection zotero zotero-addon zotero-plugin zotero6 zotero7

Last synced: 06 Apr 2025

https://github.com/ChenglongMa/zoplicate

A plugin that does one thing only: Detect and manage duplicate items in Zotero.

duplicate-detection zotero zotero-addon zotero-plugin zotero6 zotero7

Last synced: 06 Mar 2025

https://github.com/cryogenicplanet/depp

⚡ Check your npm modules for unused and duplicate dependencies fast

dependency duplicate-detection modules monorepo npm unused

Last synced: 23 Oct 2025

https://github.com/RazgrizHsu/immich-mediakit

An extension toolkit for Immich enabling advanced management capabilities through AI-powered similarity detection

duplicate duplicate-detection immich similarity

Last synced: 14 Jul 2025

https://github.com/kristiankoskimaki/vidupe

Vidupe is a program that can find duplicate and similar video files. V1.211 released on 2019-09-18, Windows exe here:

duplicate-detection duplicate-videos duplicates videos

Last synced: 10 Apr 2025

https://github.com/scrubbbbs/cbird

Command-line program for Content-Based Image Retrieval of images and videos. Includes tools for general search and de-duplication.

command-line-interface computer-vision content-based-image-retrieval duplicate-detection duplicate-files duplicates ffmpeg opencv qt6 similarity-search

Last synced: 06 Apr 2025

https://github.com/itwillwork/ostap

CLI tool that fast checks if your bundle contains multiple versions of the same package, only by looking in package.json.

bundle cli-app duplicate-detection frontend webpack

Last synced: 13 Apr 2025

https://github.com/logpai/bughub

A collection of free-text bug reports for duplicate issue identification

bug-reports datasets duplicate-detection nlp

Last synced: 04 Jan 2026

https://github.com/marius-sucan/Quick-Picto-Viewer

A uniquely crafted image viewer and editor with options to organize files, and maintain large lists of image files for slideshows, dupes detection or other purposes.

dupes-finder duplicate-detection fileorganizer files-management image image-edi image-manipulation image-organizer image-processing image-viewer imageeditor imageprocessing organizer paint paint-application slideshow slideshow-maker

Last synced: 21 Mar 2025

https://github.com/PJDude/dude

Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.

cli deduplication duplicate duplicate-detection duplicate-files duplicates duplicates-removal easy easy-to-use easyui gui gui-application python python3 sha1 threads tkinter utility utility-application

Last synced: 06 Mar 2025

https://github.com/src-d/gemini

Advanced similarity and duplicate source code at scale.

duplicate-detection duplicates hash source-code-analysis spark

Last synced: 05 May 2025

https://github.com/src-d/apollo

Advanced similarity and duplicate source code proof of concept for our research efforts.

duplicate-detection duplicates python similarity similarity-search source-code

Last synced: 05 May 2025

https://github.com/appzcoder/phpcloc

:rocket: Cloc & duplicate code checker tool

cloc code-duplication console duplicate-detection php

Last synced: 19 Jul 2025

https://github.com/asadiahmad/detect-duplicated-questions

Detect Duplicated StackOverFlow Questions

dot-product duplicate-detection nlp

Last synced: 14 Apr 2025

https://github.com/mrinjamul/go-dupfinder

Duplicate File Finder.

cleaner duplicate-detection utility

Last synced: 14 Sep 2025

https://github.com/akcarsten/duplicate-finder

This Python packages identifies duplicate files in a folder of interest.

duplicate-detection python

Last synced: 04 Mar 2025

https://github.com/justinshenk/simages

Find duplicates and similar images in a folder

autoencoder duplicate-detection images preprocessing similarity-detection

Last synced: 11 Oct 2025

https://github.com/arikw/outlook-duplicated-items-remover

A VBA script that finds and moves duplicated items in selected outlook folders

duplicate-detection duplicates outlook vba vba-script vba-snippets

Last synced: 11 Sep 2025

https://github.com/InexplicableMagic/photodedupe

A utility for locating near duplicate photos irrespective of image resolution, compression settings or file format.

computer-vision computer-vision-tools deduplication duplicate-detection image-deduplication

Last synced: 07 Apr 2025

https://github.com/nicolasbizzozzero/dupe_eraser

A command-line tool which automate the deletion of duplicate files based on their hash or perceptual-hash.

cli duplicate-detection duplicate-files duplicates file-management

Last synced: 12 Apr 2025

https://github.com/deadsoul/dugu

Find, remove and avoid duplicates with dugu: The Duplicates Guru

deduplication dugu duplicate-detection duplicate-files duplicatefilefinder duplicates duplicates-guru python

Last synced: 05 Apr 2025

https://github.com/glau-bd/duplicate-video-finder

A python module to detect duplicate videos in a directory.

cleanup data-hoarder deduplication duplicate-detection python python-3 video-processing

Last synced: 02 Oct 2025

https://github.com/sameera-madushan/findm

Findm is a python script to find duplicate file copies in a given directory.

duplicate-detection duplicate-files duplicatefilefinder file-hashing python

Last synced: 18 Jul 2025

https://github.com/transitive-bullshit/phash-gif

Perceptual GIF hashing for easily finding near-duplicate GIFs.

duplicate-detection gif gif-animation perceptual-hashing phash

Last synced: 13 Oct 2025

https://github.com/daxcay/imageduplicatefinder

Python application using ai to find duplicate images

ai duplicate-detection image-processing python standalone

Last synced: 04 Apr 2025

https://github.com/isayakhov/duplicate-stickers-remover-bot

Bot can find and remove duplicates stickers from different sticker sets

duplicate-detection python telegram telegram-bot

Last synced: 09 Jul 2025

https://github.com/tttapa/duplicate-file-finder

List all duplicate files in a directory.

cleanup duplicate-detection file-manager utility

Last synced: 14 Jun 2025

https://github.com/ricopella/cratecleaner

Make your library clutter-free! This Electron app cleans up your digital music and image collections. Unique for DJs: identifies which songs are in crates while detecting duplicates. Digs into metadata for smart cleanup.

audio dedupe-library duplicate-detection duplicatefilefinder electron image prisma react serato tailwind tanstack-react-query typescript vite zod

Last synced: 20 Oct 2025

https://github.com/jkomieter/smartshreds

SmartShreds uses Rust, hashing algorithms, and NLP to detect and manage duplicate files efficiently, optimizing storage and organization with AI-powered tools.

ai desktop-application duplicate-detection file fileorganizer filesystem gtk4 open-source rust storage-management systems-programming

Last synced: 13 May 2025

https://github.com/keyweeusr/bear

:bear: The decluttering deduplicator

cli-app clutterremoval duplicate-detection duplicates python

Last synced: 25 Jul 2025

https://github.com/erikreed/pydupes

A duplicate file finder like rdfind/fdupes et al that may be faster in environments with millions of files and terabytes of data or over high latency filesystems (e.g. NFS).

duplicate-detection duplication files

Last synced: 06 Mar 2025

https://github.com/gyanbardhan/duplicatequestiondetection

Developed and Deployed NLP Models Achieving Up to 89.89% Accuracy in Detecting Duplicate Question pairs using Transformer https://huggingface.co/spaces/gyanbardhan123/Duplicate_Question_Detection https://drive.google.com/file/d/1MsBA45Hob56OWPuLVCgG3F3QdCZgBq9a/view?usp=sharing

bert bow distilbert duplicate-detection duplicate-questions-identification feature-engineering google huggingface kaggle nlp nlp-machine-learning quora quora-question-pairs spaces text-processing tf-idf transformer

Last synced: 12 Jul 2025

https://github.com/ajmalshahabudeen/Bitwarden-Duplicate-remover

When Importing multiple CSV files Bitwarden creates Duplicate Entries. So this Python script will remove duplicate entries and keep ONE.

bitwarden bitwarden-password-vault duplicate-detection duplicates duplicates-removal python

Last synced: 27 Mar 2025

https://github.com/barchart/aws-lambda-suppressor

JavaScript utility for suppressing duplicate AWS Lambda invocations

dedupe deduplication duplicate-detection dynamodb javascript lambda public-repository serverless

Last synced: 23 Jul 2025

https://github.com/ajmalshahabudeen/bitwarden-duplicate-remover

When Importing multiple CSV files Bitwarden creates Duplicate Entries. So this Python script will remove duplicate entries and keep ONE.

bitwarden bitwarden-password-vault duplicate-detection duplicates duplicates-removal python

Last synced: 10 Jul 2025

https://github.com/dnth/fastdup-manage-clean-curate-blogpost

Find duplicate and anomalies in your dataset. Identify wrong/confusing labels in your dataset. Uncover data leak in your dataset.

anomaly-detection computer-vision data-science data-validation duplicate-detection python

Last synced: 27 Mar 2025

https://github.com/victorqribeiro/dtf

DTF - Duplicate Thumbnail Files - A method to identify duplicate files.

duplicate-detection duplicate-files systematic-mapping systematic-reviews

Last synced: 26 Jul 2025

https://github.com/harperreed/image-dupes

A tool for scanning directories, identifying duplicate or similar images via hashing, and generating an HTML report for easy review.

duplicate-detection hashing phash photos

Last synced: 29 Jul 2025

https://github.com/lvntky/noditto

Noditto: AST Based Code Duplication Finder

abstract-syntax-tree duplicate-detection duplicate-files parser

Last synced: 03 Apr 2025

https://github.com/mrxiaom/banclickwhenusingitem

Minecraft Trident dupe bug fixer | 修复网络包状态不同步导致刷三叉戟漏洞

bugfix duplicate-detection exploit minecraft paper-plugin trident

Last synced: 07 May 2025

https://github.com/jsuyog2/duplicate-finder

A Python application for detecting and managing duplicate images and videos in a specified folder. Features include a user-friendly GUI built with PySimpleGUI, real-time progress updates, and automatic moving of duplicates to organized directories. Utilizes the difPy library for image comparisons and a custom video comparison class.

automation difpy duplicate-detection file-management filesystem-operations gui image-processing progress-bar pysimplegui python video-processing

Last synced: 21 Sep 2025

https://github.com/pouyakary/dup

a tiny and fast command line utility to find the duplicate files within a directory

cli cmd duplicate-detection duplicate-files duplicates filesystem gnu-utilities utility

Last synced: 14 May 2025

https://github.com/razum2um/xxhashdir_comm

🏭 identifies common or duplicates across different hosts

difference-detection duplicate-detection xxhash xxhashdir

Last synced: 27 Oct 2025

https://github.com/sergio0694/clup

A no-nonsense .NET Core 2.1 CLI duplicate files remover

cli cli-app dotnet dotnet-tool dotnetcore duplicate-detection duplicate-files duplicates-removed netcoreapp

Last synced: 03 Aug 2025

https://github.com/metaory/xdedup

find and remove duplicates

cli duplicate-detection

Last synced: 28 Jul 2025

https://github.com/fabricesalvaire/filewalker

A Python library to scan a file system, find duplicated file etc.

duplicate-detection duplicate-files python-library python3

Last synced: 16 Jun 2025

https://github.com/whoswhip/file-manager

A cool little c# application that lets you rename files, detect duplicate files, use multiple gallery-dl instances at once, send all files in a directory that are =< 25MB to a discord webhook, and generate a secure password or username!

discord-webhook duplicate-detection duplicate-removal duplicatefilefinder file-manager file-renamer file-renaming filemanager files gallery-dl password-generator username-generator webhook

Last synced: 24 Aug 2025

https://github.com/tyler-tee/file-deduplicator

Python app built to scan a directory, check for duplicate files, and send them to the trash.

duplicate duplicate-detection duplicate-files pysimplegui python

Last synced: 23 Feb 2025

https://github.com/dahead/dupefiles2

DUPEFILES2 helps you find duplicates files on your systems.

cli csharp dotnet-core duplicate-detection linux spectre-console

Last synced: 29 Dec 2025

https://github.com/junsious/dupfinder

A simple desktop application to search for duplicate files in a specified directory. This application uses SHA-256 hashing to identify duplicates and provides a user-friendly interface with progress tracking.

duplicate-detection duplicate-files files filesfinder rust

Last synced: 11 Jul 2025

https://github.com/eddie4k-code/kafka-connect-deduplicator

A Kafka Connect Single Message Transformation that will avoid duplicate messages being delivered.

apache-kafka duplicate-detection kafka kafka-connect kafka-connect-transformations kafka-connect-transforms single-message-transforms smt

Last synced: 13 May 2025

https://github.com/dpoetzsch/photo-tools

A collection of scripts to manage photos, especially to find duplicates and visually similar images.

duplicate-detection photos visual-similarity visual-similarity-search

Last synced: 07 Apr 2025

https://github.com/exitare/duplicateimagefinder

A tool to find duplicate images for given paths

duplicate-detection duplicates filemanagement images python python3

Last synced: 13 Aug 2025

https://github.com/webprofusion/duplicatefilechecker

Windows app to scan two folders and produce CSV list of suspected duplicates, optionally using file content hash

csharp duplicate-detection windows

Last synced: 04 Mar 2025

https://github.com/busterc/similars

:dancers: Find similar objects and partial duplicates in collections

arrays collections duplicate-detection duplicates similar-objects similarity-search

Last synced: 16 Jun 2025

https://github.com/dahead/dupefiles

Dupe Files scans your disks for duplicate files.

csharp deduplication dotnet-core duplicate-detection duplicate-files duplicatefilefinder

Last synced: 27 Dec 2025

https://github.com/davdiv/hashfolder

Simple command line tool that can create/update an sqlite database that contains the hash (by default SHA256) of all files inside a specified root folder.

checksum duplicate-detection duplicate-files duplicates sha256

Last synced: 16 May 2025

https://github.com/jempe/gitlfslite

GitLFSLite A lightweight tool for managing large files in Git repositories by using metadata text files and rsync to simplify synchronization, offering a practical alternative to Git LFS and Git Annex.

duplicate-detection git-backup large-files

Last synced: 16 May 2025

https://github.com/amnuts/duplicate-hunter

Hunt down duplicate files on your computer

duplicate-detection golang hacktoberfest reactjs wails2

Last synced: 05 Oct 2025

https://github.com/Trophonix/RemoveDuplicates

Simple java program to delete all files in the working directory.

duplicate-detection duplicate-files duplicatefilefinder files gpl gplv3 io java

Last synced: 10 Mar 2025

https://github.com/itsayellow/finddup

Find duplicate files or directories in a list of paths.

duplicate-detection duplicate-files files

Last synced: 20 Jun 2025

https://github.com/apaz-cli/ml-imagehash

A PyTorch implementation of a machine learning perceptual image hash algorithm for near-duplicate detection and fast content-based image retrieval.

duplicate-detection image-processing imagehash near-duplicates perceptual-hashing

Last synced: 27 Dec 2025

https://github.com/luis-varona/shadowseek

A CLI tool for near-duplicate detection in text files, written in Rust with no dependencies on runtime environments.

duplicate-detection minhash near-duplicate-detection simhash text-classification

Last synced: 25 Jul 2025

https://github.com/gechandesu/fdup

File duplicates finder

duplicate-detection duplicate-files vlang

Last synced: 12 Dec 2025

https://github.com/arthurmor4is/duplicate-logic-detector-action

🔍 Automatically detect duplicate logic in Python code changes using advanced AST analysis and semantic similarity. Prevent code duplication and improve code quality.

ast-analysis code-quality code-review duplicate-detection github-actions python static-analysis

Last synced: 08 Oct 2025

https://github.com/giosali/dupeutil

A command-line program written in Python for detecting and removing duplicate files.

command-line-tool duplicate-detection duplicate-files python

Last synced: 17 Jun 2025

https://github.com/jempe/shasums_duplicates

Shasums Duplicates A Bash and Golang utility for detecting and managing duplicate files by generating, comparing, and processing sorted hash lists.

duplicate-detection shell-script-generator

Last synced: 16 May 2025