{"id":13658170,"url":"https://github.com/elisemercury/Duplicate-Image-Finder","last_synced_at":"2025-04-24T08:31:36.781Z","repository":{"id":37038955,"uuid":"323089735","full_name":"elisemercury/Duplicate-Image-Finder","owner":"elisemercury","description":"difPy - Python package for finding duplicate and similar images","archived":false,"fork":false,"pushed_at":"2025-01-16T15:42:48.000Z","size":28723,"stargazers_count":492,"open_issues_count":8,"forks_count":70,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-15T21:22:24.673Z","etag":null,"topics":["difpy","duplicate","find","images","pictures","python","similarity"],"latest_commit_sha":null,"homepage":"https://difpy.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elisemercury.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["elisemercury"]}},"created_at":"2020-12-20T14:23:46.000Z","updated_at":"2025-04-10T23:33:52.000Z","dependencies_parsed_at":"2024-01-12T02:42:48.574Z","dependency_job_id":"4360a8d3-cb14-4d03-aa63-3b0479b8c7d8","html_url":"https://github.com/elisemercury/Duplicate-Image-Finder","commit_stats":{"total_commits":264,"total_committers":9,"mean_commits":"29.333333333333332","dds":"0.33712121212121215","last_synced_commit":"25c4838154d9fc08d49c9e7950ad9b3920040e7a"},"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisemercury%2FDuplicate-Image-Finder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisemercury%2FDuplicate-Image-Finder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisemercury%2FDuplicate-Image-Finder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisemercury%2FDuplicate-Image-Finder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elisemercury","download_url":"https://codeload.github.com/elisemercury/Duplicate-Image-Finder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250592032,"owners_count":21455486,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["difpy","duplicate","find","images","pictures","python","similarity"],"created_at":"2024-08-02T05:00:57.046Z","updated_at":"2025-04-24T08:31:36.749Z","avatar_url":"https://github.com/elisemercury.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"static/difPy_logo_3.png\" width=\"300\" title=\"Example Output: Duplicate Image Finder\"\u003e\n\u003c/p\u003e\n\n# Duplicate Image Finder (difPy)\n\n[![PyPIv](https://img.shields.io/pypi/v/difPy)](https://pypi.org/project/difPy/)\n[![PyPI status](https://img.shields.io/pypi/status/difPy)](https://pypi.org/project/difPy/)\n[![Documentation Status](https://readthedocs.org/projects/difpy/badge/?version=latest)](https://difpy.readthedocs.io/en/latest/?badge=latest)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/difPy)](https://pypi.org/project/difPy/)\n[![Downloads](https://static.pepy.tech/badge/difpy)](https://pepy.tech/project/difpy)\n[![PyPI - License](https://img.shields.io/pypi/l/difPy)](https://github.com/elisemercury/Duplicate-Image-Finder/blob/main/LICENSE.txt)\n[\u003cimg src=\"https://img.shields.io/badge/dif-Py-blue?style=flat\u0026logo=python\u0026labelColor=white\u0026logoWidth=20.svg/\"\u003e\u003c/a\u003e](https://github.com/elisemercury/Duplicate-Image-Finder/)\n\n**Tired of going through all images in a folder and comparing them manually to check if they are duplicates?**\n\n:white_check_mark: The Duplicate Image Finder (difPy) Python package **automates** this task for you!\n\n```python\npip install difPy\n```\n\n\u003e ✨🚀 **Join the [difPy for Desktop beta tester](https://difpy.short.gy/desktop-beta-ghb) program and be among to first to test the new difPy desktop app!**\n\n\u003e :open_hands: Our motto? We :heart: Open Source! **Contributions and new ideas for difPy are always welcome** - check our [Contributor Guidelines](https://difpy.readthedocs.io/en/latest/03_contributing/contributing.html) for more information.\n\nRead more on how the algorithm of difPy works in my Medium article [Finding Duplicate Images with Python](https://towardsdatascience.com/finding-duplicate-images-with-python-71c04ec8051).\n\nCheck out the [difPy package on PyPI.org](https://pypi.org/project/difPy/)\n\n-------\n\n## Description\ndifPy searches for images in **one or more different folders**, compares the images it found and checks whether these are duplicates. It then outputs the **image files classified as duplicates** as well as the **images having the lowest resolutions**, so you know which of the duplicate images are safe to be deleted. You can then either delete them manually, or let difPy delete them for you.\n\ndifPy does not compare images based on their hashes. It compares them based on their tensors i. e. the image content - this allows difPy to **not only search for duplicate images, but also for similar images**.\n\ndifPy leverages Python's **multiprocessing capabilities** and is therefore able to perform at high performance even on large datasets. \n\n:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.\n\n## Table of Contents\n1. [Basic Usage](https://github.com/elisemercury/Duplicate-Image-Finder#basic-usage)\n2. [Output](https://github.com/elisemercury/Duplicate-Image-Finder#output)\n3. [Additional Parameters](https://github.com/elisemercury/Duplicate-Image-Finder#additional-parameters)\n4. [CLI Usage](https://github.com/elisemercury/Duplicate-Image-Finder#cli-usage)\n5. [difPy for Desktop](https://github.com/elisemercury/Duplicate-Image-Finder#difpy-for-desktop)\n\n## Basic Usage\nTo make difPy search for duplicates **within one folder**:\n\n```python\nimport difPy\ndif = difPy.build('C:/Path/to/Folder/')\nsearch = difPy.search(dif)\n``` \nTo search for duplicates **within multiple folders**:\n\n```python\nimport difPy\ndif = difPy.build(['C:/Path/to/Folder_A/', 'C:/Path/to/Folder_B/', 'C:/Path/to/Folder_C/', ... ])\nsearch = difPy.search(dif)\n``` \n\nFolder paths can be specified as standalone Python strings, or within a list. With `difPy.build()`, difPy first scans the images in the provided folders and builds a collection of images by generating image tensors. `difPy.search()` then starts the search for duplicate images.\n\n:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.\n\n## Output\ndifPy returns various types of output that you may use depending on your use case: \n\n### I. Search Result\nA **JSON formatted collection** of duplicates/similar images (i. e. **match groups**) that were found. Each match group has a primary image (the key of the dictionary) which holds the list of its duplicates including their filename and MSE (Mean Squared Error). The lower the MSE, the more similar the primary image and the matched images are. Therefore, an MSE of 0 indicates that two images are exact duplicates.\n\n```python\nsearch.result\n\n\u003e Output:\n{'C:/Path/to/Image/image1.jpg' : [['C:/Path/to/Image/duplicate_image1a.jpg', 0.0], \n                                  ['C:/Path/to/Image/duplicate_image1b.jpg', 0.0]],\n 'C:/Path/to/Image/image2.jpg' : [['C:/Path/to/Image/duplicate_image2a.jpg', 0.0]],\n ...\n}\n``` \n\n### II. Lower Quality Files\nA **list** of duplicates/similar images that have the **lowest quality** (image resolution) among match groups: \n\n```python\nsearch.lower_quality\n\n\u003e Output:\n['C:/Path/to/Image/duplicate_image1.jpg', \n 'C:/Path/to/Image/duplicate_image2.jpg', ...]\n``` \n\nLower quality images then can be **moved** to a different location:\n\n```python\nsearch.move_to(destination_path='C:/Path/to/Destination/')\n```\nOr **deleted**:\n\n```python\nsearch.delete(silent_del=False)\n```\n\n### III. Search Statistics\n\nA **JSON formatted collection** with statistics on the completed difPy processes:\n\n```python\nsearch.stats\n\n\u003e Output:\n{'directory': ['C:/Path/to/Folder_A/', 'C:/Path/to/Folder_B/', ... ],\n 'process': {'build': {'duration': {'start': '2024-02-18T19:52:39.479548',\n                                    'end': '2024-02-18T19:52:41.630027',\n                                    'seconds_elapsed': 2.1505},\n                       'parameters': {'recursive': True,\n                                      'in_folder': False,\n                                      'limit_extensions': True,\n                                      'px_size': 50,\n                                      'processes': 5}},\n             'search': {'duration': {'start': '2024-02-18T19:52:41.630027',\n                                     'end': '2024-02-18T19:52:46.770077',\n                                     'seconds_elapsed': 5.14},\n                        'parameters': {'similarity_mse': 0,\n                                       'rotate': True,\n                                       'same_dim': True,\n                                       'processes': 5,\n                                       'chunksize': None},\n                        'files_searched': 3232,\n                        'matches_found': {'duplicates': 3030, \n                                          'similar': 0}}},\n 'total_files': {'count': 3232},\n 'invalid_files': {'count': 0, \n                   'logs': {}}}\n```\n\n## Additional Parameters\ndifPy supports the following parameters:\n\n```python\ndifPy.build(*directory, recursive=True, in_folder=False, limit_extensions=True, px_size=50, \n            show_progress=True, processes=os.cpu_count())\n```\n\n```python\ndifPy.search(difpy_obj, similarity='duplicates', rotate=True, same_dim=True, show_progress=True, \n             processes=os.cpu_count(), chunksize=None)\n```\n\n:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.\n\n## CLI Usage\ndifPy can also be invoked through the CLI by using the following commands:\n\n```python\npython dif.py #working directory\n\npython dif.py -D 'C:/Path/to/Folder/'\n\npython dif.py -D 'C:/Path/to/Folder_A/' 'C:/Path/to/Folder_B/' 'C:/Path/to/Folder_C/'\n```\n\n\u003e :point_right: Windows users can add difPy to their [PATH system variables](https://www.computerhope.com/issues/ch000549.htm) by pointing it to their difPy package installation folder containing the [`difPy.bat`](https://github.com/elisemercury/Duplicate-Image-Finder/difPy/difPy.bat) file. This adds `difPy` as a command in the CLI and will allow direct invocation of `difPy` from anywhere on the device.\n\ndifPy CLI supports the following arguments:\n\n```python\ndif.py [-h] [-D DIRECTORY [DIRECTORY ...]] [-Z OUTPUT_DIRECTORY] \n       [-r {True,False}] [-i {True,False}] [-le {True,False}] \n       [-px PX_SIZE]  [-s SIMILARITY] [-ro {True,False}]\n       [-dim {True,False}] [-proc PROCESSES] [-ch CHUNKSIZE] \n       [-mv MOVE_TO] [-d {True,False}] [-sd {True,False}]\n       [-p {True,False}]\n```\n\n| | Parameter | | Parameter |\n| :---: | ------ | :---: | ------ | \n| `-D` | directory | `-dim` | same_dim |\n| `-Z` | output_directory | `-proc` | processes | \n| `-r`| recursive | `-ch` | chunksize |\n| `-i`| in_folder | `-mv` | move_to |\n| `-le` | limit_extensions | `-d` | delete |\n| `-px` | px_size | `-sd` | silent_del |\n| `-s`| similarity | `-p` | show_progress | \n| `-ro` | rotate | \n\nIf no directory parameter is given in the CLI, difPy will **run on the current working directory**.\n\nWhen running from the CLI, the output of difPy is written to files and **saved in the working directory** by default. To change the default output directory, specify the `-Z / -output_directory` parameter. The \"xxx\" in the output filenames is the current timestamp:\n\n```python\ndifPy_xxx_results.json\ndifPy_xxx_lower_quality.json\ndifPy_xxx_stats.json\n```\n\n:notebook: For a **detailed usage guide**, please view the official **[difPy Usage Documentation](https://difpy.readthedocs.io/)**.\n\n## difPy for Desktop\n\nThe new difPy desktop app brings difPy directly to your desktop. We are now accepting beta tester sign ups and will soon be starting our first tester access wave.\n\n✨🚀 **Join the [difPy for Desktop beta tester](https://difpy.short.gy/desktop-beta-ghb) program now and be among to first to test the new difPy desktop app!**\n\n-------\n\n\u003cp align=\"center\"\u003e\u003cb\u003e\n:heart: Open Source\n\u003c/b\u003e\u003c/p\u003e\n","funding_links":["https://github.com/sponsors/elisemercury"],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felisemercury%2FDuplicate-Image-Finder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felisemercury%2FDuplicate-Image-Finder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felisemercury%2FDuplicate-Image-Finder/lists"}