{"id":18264793,"url":"https://github.com/daxcay/imageduplicatefinder","last_synced_at":"2025-04-04T21:30:47.573Z","repository":{"id":244514730,"uuid":"815462963","full_name":"daxcay/ImageDuplicateFinder","owner":"daxcay","description":"Python application using ai to find duplicate images","archived":false,"fork":false,"pushed_at":"2024-11-28T09:46:04.000Z","size":160,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-20T19:17:06.735Z","etag":null,"topics":["ai","duplicate-detection","image-processing","python","standalone"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daxcay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-15T08:26:51.000Z","updated_at":"2024-11-28T09:46:07.000Z","dependencies_parsed_at":"2024-06-25T11:34:19.371Z","dependency_job_id":"4a1e6f41-6eaa-499a-a3b6-ca7054a73ce9","html_url":"https://github.com/daxcay/ImageDuplicateFinder","commit_stats":null,"previous_names":["daxcay/imageduplicatefinder"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daxcay%2FImageDuplicateFinder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daxcay%2FImageDuplicateFinder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daxcay%2FImageDuplicateFinder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daxcay%2FImageDuplicateFinder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daxcay","download_url":"https://codeload.github.com/daxcay/ImageDuplicateFinder/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247252015,"owners_count":20908609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","duplicate-detection","image-processing","python","standalone"],"created_at":"2024-11-05T11:16:04.286Z","updated_at":"2025-04-04T21:30:47.567Z","avatar_url":"https://github.com/daxcay.png","language":"Python","funding_links":["https://buymeacoffee.com/daxtoncaylor","https://paypal.me/daxtoncaylor"],"categories":[],"sub_categories":[],"readme":"\n![COMFY-UI (4)](https://github.com/daxcay/ImageDuplicateFinder/assets/164315771/39d8151a-b234-4fd5-b65d-2291a96585ff)\n\n# Image Duplicate Finder\n\n![image](https://img.shields.io/badge/version-1.1.1-green) ![image](https://img.shields.io/badge/last_update-July_2024-green)\n\n**Image Duplicate Finder** is a webapp designed to efficiently identify duplicate images within a dataset. This project utilizes advanced image processing and machine learning techniques to compare and detect duplicates.\n\nYoutube video tutorial: [https://www.youtube.com/watch?v=u90vtRh4Fr8](https://youtu.be/-v9X4CBX81A)\n\n## Features:\n  - Can run more than 1 task at a time. duplicate page in browser to start new task\n  - On results page all higher resolution image gets unchecked.\n\n## Installation\n\n  #### Requirements\n  \n  - Latest Python (😅)\n  - Git (https://git-scm.com/downloads)\n  - Virtual environment package `virtualenv`: To install open CMD and write `pip install virtualenv`    \n  - Clone repository into a folder: `git clone https://github.com/daxcay/ImageDuplicateFinder`\n\n  #### Execution\n\n  Go to the cloned repository folder: \n\n  - **For Windows**: open `run_window.bat` and allow app to run. (as its downloaded from internet it will ask for it)\n  - **Linux \u0026 MacOS**: open shell/terminal and give execute permission like this: `chmod +x run_linux_mac.sh` and execute it like this: `./run_linux_mac.sh`\n\n  #### Initial run\n  - When running for the first time the program will install Flask, numpy, Pillow, scikit-learn, tensorflow.  \n  - Updates will take place automatically.\n  - If all goes well a browser windows will open with this address `http://127.0.0.1:5501`\n\n  ![image](https://github.com/daxcay/ImageDuplicateFinder/assets/164315771/19919300-bfbb-4d45-8b72-dba08e4a0510)\n\n## Usage\n\n- ### Image path\n\n    Got to your image folder and copy it as a path and paste it in the input box.\n\n    \u003e **Note**: Make sure images names have no spaces.\n  \n- ### Setting Euclidean Similarity Threshold\n\n    A **lower** value indicates higher similarity between the images.\n\n    - 0 (exactly same)\n    - 0.5 (similar enough)\n    - 1.0 (different)\n\n\u003e **Note:** if you notice too many false positives (different images flagged as duplicates), lower the `Euclidean Similarity Threshold` and/or raise the `Cosine Similarity Threshold`.\n    \n- ### Setting Cosine Similarity Threshold\n  \n    A **higher** value indicates higher similarity between the images.\n\n    - 0 (different)\n    - 0.5 (similar enough)\n    - 1 (exactly same)\n\n\u003e **Note:** if you notice too many false negatives (missed duplicates), raise the `Euclidean Similarity Threshold` and/or lower the `Cosine Similarity Threshold`.\n\n- ### Best Setting\n    - Euclidean Similarity Threshold = **0.5**\n    - Cosine Similarity Threshold = **0.9**\n\nAfter a successful run, you will see a page to select images for deletion. By default, all the duplicate images are selected for deletion, but check the selected images for any incorrect selection. Press \"delete selected\" to delete all the duplicate files. Finally, open the image directory. Images are now duplicate-free. A backup folder is created with all the original files (with duplicates) in case anything goes wrong.\n\n## Credits\n\n### Raf Stahelin - Testing and Feedback\n\n### Daxton Caylor - ComfyUI Node Developer \n  - ### Contact\n     - **Twitter**: @daxcay27\n     - **Email** - daxtoncaylor@gmail.com\n     - **Discord** - daxtoncaylor\n     - **DiscordServer**: https://discord.gg/UyGkJycvyW\n    \n  - ### Support\n     - **Buy me a coffee**: https://buymeacoffee.com/daxtoncaylor\n     - **Support me on paypal**: https://paypal.me/daxtoncaylor\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaxcay%2Fimageduplicatefinder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaxcay%2Fimageduplicatefinder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaxcay%2Fimageduplicatefinder/lists"}