https://github.com/ybubnov/imagedup
A naive command-line tool to remove duplicated images using OpenCV
https://github.com/ybubnov/imagedup
Last synced: about 2 months ago
JSON representation
A naive command-line tool to remove duplicated images using OpenCV
- Host: GitHub
- URL: https://github.com/ybubnov/imagedup
- Owner: ybubnov
- Created: 2023-06-24T18:46:29.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-26T08:25:18.000Z (almost 2 years ago)
- Last Synced: 2025-04-03T11:56:21.169Z (about 2 months ago)
- Language: Jupyter Notebook
- Size: 247 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Imagedup
A tool to manage image duplicates - program find image duplicates in a specified folder
and removes them if necessary.## Installation
We recommend to use `pyenv` to manage necessary python version and `poetry` to manage
dependencies installation:
```sh
% brew install pyenv pyenv-virtualenv
```Then the process of configuring the environment looks like following:
```sh
% pyenv install 3.9.1
% pyenv virtualenv 3.9.1 imagedup
% pyenv activate imagedup
```Then install necessary dependencies:
```sh
% pip install poetry
% poetry config virtualenvs.create false
% poetry install --with-root
```## Usage
You can run an `imagedup` command right from the repository root in the following way:
```sh
% python -m imagedup ./dataset
```By default the tool does not delete files and simply prints the files to delete into
the standard output. If you want to delete duplicates, consider calling the tool like
following:
```sh
% python -m imagedup.shell ./dataset -q --rm
```## Analysis
The following image outlines how exactly the `--min-score` and `--min-area` parameters
relate to the number of images being removed from the directory.By default this tool guarantees removal of 50% of the images from a directory.
