https://github.com/jacobmarks/image-deduplication-plugin
Remove exact and approximate duplicates from your dataset in FiftyOne!
https://github.com/jacobmarks/image-deduplication-plugin
computer-vision data-cleaning deduplication fiftyone image-processing plugin python similarity
Last synced: 2 months ago
JSON representation
Remove exact and approximate duplicates from your dataset in FiftyOne!
- Host: GitHub
- URL: https://github.com/jacobmarks/image-deduplication-plugin
- Owner: jacobmarks
- Created: 2023-09-12T01:05:52.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-04T23:54:55.000Z (almost 2 years ago)
- Last Synced: 2024-04-16T07:21:07.706Z (over 1 year ago)
- Topics: computer-vision, data-cleaning, deduplication, fiftyone, image-processing, plugin, python, similarity
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 8
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Image Deduplication Plugin
This plugin is a Python plugin that streamlines image deduplication workflows!
With this plugin, you can:
- Find _exact_ duplicate images using a hash function
- Find _near_ duplicate images using an embedding model and similarity threshold
- View and interact with duplicate images in the App
- Remove all duplicates, or keep a representative image from each duplicate set
## Watch On Youtube
[](https://www.youtube.com/watch?v=aingeh0KdPw&list=PLuREAXoPgT0RZrUaT0UpX_HzwKkoB-S9j&index=5)
## Installation
```shell
fiftyone plugins download https://github.com/jacobmarks/image-deduplication-plugin
```
## Operators
### `find_approximate_duplicate_images`

This operator finds near-duplicate images in a dataset using a specified similarity index paired with either a distance threshold or a fraction of samples to mark as duplicates.
### `find_exact_duplicate_images`

This operator finds exact duplicate images in a dataset using a hash function.
### `display_approximate_duplicate_groups`

This operator displays the images in a dataset that are near-duplicates of each other, grouped together.
### `display_exact_duplicate_groups`

This operator displays the images in a dataset that are exact duplicates of each other, grouped together.
### `remove_all_approximate_duplicates`

This operator removes all near-duplicate images from a dataset.
### `remove_all_exact_duplicates`

This operator removes all exact duplicate images from a dataset.
### `deduplicate_approximate_duplicates`

This operator removes near-duplicate images from a dataset, _keeping a representative image_ from each duplicate set.
### `deduplicate_exact_duplicates`

This operator removes exact duplicate images from a dataset, _keeping a representative image_ from each duplicate set.