{"id":13394775,"url":"https://github.com/idealo/imagededup","last_synced_at":"2025-05-14T22:02:30.054Z","repository":{"id":39521357,"uuid":"179674896","full_name":"idealo/imagededup","owner":"idealo","description":"😎 Finding duplicate images made easy!","archived":false,"fork":false,"pushed_at":"2025-05-07T19:37:07.000Z","size":22243,"stargazers_count":5349,"open_issues_count":51,"forks_count":465,"subscribers_count":63,"default_branch":"master","last_synced_at":"2025-05-07T21:15:59.285Z","etag":null,"topics":["computer-vision","e-commerce","hashing","idealo","image-deduplication","image-preprocessing","neural-network","python","pytorch"],"latest_commit_sha":null,"homepage":"https://idealo.github.io/imagededup/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idealo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-04-05T12:10:54.000Z","updated_at":"2025-05-07T20:56:13.000Z","dependencies_parsed_at":"2023-02-10T05:15:55.465Z","dependency_job_id":"801b82fe-735d-40f5-8869-19f657d055e1","html_url":"https://github.com/idealo/imagededup","commit_stats":{"total_commits":444,"total_committers":15,"mean_commits":29.6,"dds":0.3288288288288288,"last_synced_commit":"c4e0c4e8d174c6afc99fe1d1cf1859c7c5f49fdf"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idealo%2Fimagededup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idealo%2Fimagededup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idealo%2Fimagededup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idealo%2Fimagededup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idealo","download_url":"https://codeload.github.com/idealo/imagededup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254010782,"owners_count":21998993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","e-commerce","hashing","idealo","image-deduplication","image-preprocessing","neural-network","python","pytorch"],"created_at":"2024-07-30T17:01:31.138Z","updated_at":"2025-05-14T22:02:25.037Z","avatar_url":"https://github.com/idealo.png","language":"Python","readme":"# Image Deduplicator (imagededup)\n\n[![Build Status](https://github.com/idealo/imagededup/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/idealo/imagededup/actions/workflows/test.yml)\n[![Docs](https://img.shields.io/badge/docs-online-brightgreen)](https://idealo.github.io/imagededup/)\n[![codecov](https://codecov.io/gh/idealo/imagededup/branch/master/graph/badge.svg)](https://codecov.io/gh/idealo/imagededup)\n[![PyPI Version](https://img.shields.io/pypi/v/imagededup)](https://pypi.org/project/imagededup/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/idealo/imagededup/blob/master/LICENSE)\n\nimagededup is a python package that simplifies the task of finding **exact** and **near duplicates** in an image collection.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"readme_figures/mona_lisa.png\" width=\"600\" /\u003e\n\u003c/p\u003e\n\nThis package provides functionality to make use of hashing algorithms that are particularly good at finding exact\nduplicates as well as convolutional neural networks which are also adept at finding near duplicates. An evaluation\nframework is also provided to judge the quality of deduplication for a given dataset.\n\nFollowing details the functionality provided by the package:\n\n- Finding duplicates in a directory using one of the following algorithms:\n  - [Convolutional Neural Network](https://arxiv.org/abs/1905.02244#:~:text=MobileNetV3%20is%20tuned%20to%20mobile,improved%20through%20novel%20architecture%20advances.) (CNN) - Select from several prepackaged models or provide your own custom model.\n  - [Perceptual hashing](http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html) (PHash)\n  - [Difference hashing](http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html) (DHash)\n  - [Wavelet hashing](https://fullstackml.com/wavelet-image-hash-in-python-3504fdd282b5) (WHash)\n  - [Average hashing](http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html) (AHash)\n- Generation of encodings for images using one of the above stated algorithms.\n- Framework to evaluate effectiveness of deduplication  given a ground truth mapping.\n- Plotting duplicates found for a given image file.\n\nDetailed documentation for the package can be found at: [https://idealo.github.io/imagededup/](https://idealo.github.io/imagededup/)\n\nimagededup is compatible with Python 3.8+ and runs on Linux, MacOS X and Windows.\nIt is distributed under the Apache 2.0 license.\n\n## 📖 Contents\n\n- [Installation](#%EF%B8%8F-installation)\n- [Quick Start](#-quick-start)\n- [Benchmarks](#-benchmarks)\n- [Contribute](#-contribute)\n- [Citation](#-citation)\n- [Maintainers](#-maintainers)\n- [License](#-copyright)\n\n## ⚙️ Installation\n\nThere are two ways to install imagededup:\n\n- Install imagededup from PyPI (recommended):\n\n```\npip install imagededup\n```\n\n- Install imagededup from the GitHub source:\n\n```bash\ngit clone https://github.com/idealo/imagededup.git\ncd imagededup\npip install \"cython\u003e=0.29\"\npython setup.py install\n```  \n\n## 🚀 Quick Start\n\nIn order to find duplicates in an image directory using perceptual hashing, following workflow can be used:\n\n- Import perceptual hashing method\n\n```python\nfrom imagededup.methods import PHash\nphasher = PHash()\n```\n\n- Generate encodings for all images in an image directory\n\n```python\nencodings = phasher.encode_images(image_dir='path/to/image/directory')\n```\n\n- Find duplicates using the generated encodings\n\n```python\nduplicates = phasher.find_duplicates(encoding_map=encodings)\n```\n\n- Plot duplicates obtained for a given file (eg: 'ukbench00120.jpg') using the duplicates dictionary\n\n```python\nfrom imagededup.utils import plot_duplicates\nplot_duplicates(image_dir='path/to/image/directory',\n                duplicate_map=duplicates,\n                filename='ukbench00120.jpg')\n```\n\nThe output looks as below:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"readme_figures/plot_dups.png\" width=\"600\" /\u003e\n\u003c/p\u003e\n\nThe complete code for the workflow is:\n\n```python\nfrom imagededup.methods import PHash\nphasher = PHash()\n\n# Generate encodings for all images in an image directory\nencodings = phasher.encode_images(image_dir='path/to/image/directory')\n\n# Find duplicates using the generated encodings\nduplicates = phasher.find_duplicates(encoding_map=encodings)\n\n# plot duplicates obtained for a given file using the duplicates dictionary\nfrom imagededup.utils import plot_duplicates\nplot_duplicates(image_dir='path/to/image/directory',\n                duplicate_map=duplicates,\n                filename='ukbench00120.jpg')\n```\nIt is also possible to use your own custom models for finding duplicates using the CNN method.\n\nFor examples, refer [this](https://github.com/idealo/imagededup/tree/master/examples) part of the\nrepository.\n\nFor more detailed usage of the package functionality, refer: [https://idealo.github.io/imagededup/](https://idealo.github.io/imagededup/)\n\n## ⏳ Benchmarks\n\n**Update**: Provided benchmarks are only valid upto `imagededup v0.2.2`. The next releases have significant changes to all methods, so the current benchmarks may not hold.\n\nDetailed benchmarks on speed and classification metrics for different methods have been provided in the [documentation](https://idealo.github.io/imagededup/user_guide/benchmarks/).\nGenerally speaking, following conclusions can be made:\n\n- CNN works best for near duplicates and datasets containing transformations.\n- All deduplication methods fare well on datasets containing exact duplicates, but Difference hashing is the fastest.\n\n## 🤝 Contribute\n\nWe welcome all kinds of contributions.\nSee the [Contribution](CONTRIBUTING.md) guide for more details.\n\n## 📝 Citation\n\nPlease cite Imagededup in your publications if this is useful for your research. Here is an example BibTeX entry:\n\n```BibTeX\n@misc{idealods2019imagededup,\n  title={Imagededup},\n  author={Tanuj Jain and Christopher Lennan and Zubin John and Dat Tran},\n  year={2019},\n  howpublished={\\url{https://github.com/idealo/imagededup}},\n}\n```\n\n## 🏗 Maintainers\n\n- Tanuj Jain, github: [tanujjain](https://github.com/tanujjain)\n- Christopher Lennan, github: [clennan](https://github.com/clennan)\n- Dat Tran, github: [datitran](https://github.com/datitran)\n\n## © Copyright\n\nSee [LICENSE](LICENSE) for details.\n","funding_links":[],"categories":["Python","图像数据与CV","Repos","HarmonyOS"],"sub_categories":["Windows Manager"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidealo%2Fimagededup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidealo%2Fimagededup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidealo%2Fimagededup/lists"}