Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adriacabeza/go-imagecleaner
📸 Clean your image folder using perceptual hashing and BK-trees using Go!
https://github.com/adriacabeza/go-imagecleaner
bk-tree clustering golang image-processing perceptual-hashing
Last synced: 2 months ago
JSON representation
📸 Clean your image folder using perceptual hashing and BK-trees using Go!
- Host: GitHub
- URL: https://github.com/adriacabeza/go-imagecleaner
- Owner: adriacabeza
- Created: 2021-08-17T18:51:24.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-08-18T08:40:11.000Z (over 3 years ago)
- Last Synced: 2024-07-31T20:42:35.155Z (6 months ago)
- Topics: bk-tree, clustering, golang, image-processing, perceptual-hashing
- Language: Go
- Homepage:
- Size: 973 KB
- Stars: 11
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Image Cleaner
🏞🏞 ➡ 🏞[![made-with-Go](https://img.shields.io/badge/Made%20with-Go-1f425f.svg)](http://golang.org) [![Go](https://github.com/adriacabeza/go-imagecleaner/actions/workflows/go.yml/badge.svg)](https://github.com/adriacabeza/go-imagecleaner/actions/workflows/go.yml)
This tool can take your image gallery and create a new folder with image-alike-cluster folders. It uses a perceptual image hashing algorithm and a custom threshold to cluster them. An improvement could be to add deep-learning to the scene and cluster the images based on features. To cluster the images efficently, it uses a BK-trees since checking duplicates can turn into a O(N^2) problem pretty easily.
Before
After
Image Cleaner was created upon a friend request. After a friends-trip, he had several pictures that looked alike (from different smartphones) and he wanted to select the best ones. He tried to use [fdupes](https://github.com/adrianlopezroche/fdupes) to start removing the exact duplicates but it didn't even work since some of the images were sent using Google Photos, Whatsapp, etc (different compression algorithms and sizes). After a quick search I found some python examples like: [duplicate images](https://github.com/philipbl/duplicate-images) or [Fast Near Duplicate image search](https://github.com/umbertogriffo/fast-near-duplicate-image-search) but I did not find anything similar written in Go so here it is.
> As a note, this is my first piece of code written in Go so it probably won't be as good as I'd like to. Any comment or improvement will be gladly received :D
## Installation
To start using Image Cleaner, install Go and run ``go get``:
```shell
go get -u github.com/adriacabeza/go-imagecleaner
```This will retrieve the library.
Moreover, if you prefer it, you can use the [binary released version](https://github.com/adriacabeza/go-imagecleaner/releases/tag/v0.1-alpha).
## Usage
```shell
imagecleaner -imagesPath=IMAGE_PATH -threshold=THRESHOLD
```
If you do not specify any value for *threshold* it will use its default value = 10.This will create a folder called **clusters** with each image structured into cluster folders. **Note that the code only copies images, it does not remove them.**
**Example**:
```shell
$ go run main.go ./cluster_utils.go ./image_utils.go -imagesPath=/Users/adria/Downloads/Photos
Starting to cluster your images from /Users/adria/Downloads/Photos
Selected 6 images
6 / 6 [=================================================================================] 100.00% 2s
Images hashed and BK-tree created
Creating clusters
6 / 6 [=================================================================================] 100.00% 1s
Found 3 clusters in 6 images
Clusters created
3 / 3 [=================================================================================] 100.00% 0s
Done
```> note that all the clusters that are size 1 (just one image) are merged into a big folder of unique images
## TODO
- [ ] Try another hash functions
- [ ] Add some testing
- [x] Create binary### Credits
- [Duplicate image detection](https://benhoyt.com/writings/duplicate-image-detection/): the idea of this module is mainly based on this very cool blogpost.
- [Go image Hash Library](https://github.com/corona10/goimagehash): the image hash algorithm was taken from this module.
- [Go BK-trees Library](https://github.com/agatan/bktree): the BK-tree structure implementation was taken from this library.