Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/onroerenderfgoed/janssens
Your friendly neighbourhood duplicate image finder
https://github.com/onroerenderfgoed/janssens
Last synced: about 5 hours ago
JSON representation
Your friendly neighbourhood duplicate image finder
- Host: GitHub
- URL: https://github.com/onroerenderfgoed/janssens
- Owner: OnroerendErfgoed
- License: mit
- Created: 2024-02-05T17:54:36.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-02-05T17:59:59.000Z (10 months ago)
- Last Synced: 2024-04-16T01:56:05.559Z (7 months ago)
- Language: Python
- Size: 16 MB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Janssens: your friendly neighbourhood duplicate image finder
Janssens is a simple script to search a file system folder for duplicate images. Janssens does not compare filenames, but the images themselves by calculating the [dhash](https://github.com/benhoyt/dhash) for every image and comparing them.
## Installation
- Clone this repo
- Install the requirements: *pip install -r requirements.txt*
- Configure the *janssens.py* file and change the FOLDER variable to your image folder. A folder of testdata is included. It contains some examples of duplication Janssens can detect.
- Janssens calculates the overlap between two dhashes. The higher the overlap, the more likely two images are identical or similar. You can control the THRESHOLD that reports a match. Any similarities lower than this threshold will not be reported. Experience suggests that matches above 90% are a good match while between 85 and 90 is often a match but has more false postives and below 85 generally has too many false positives.
- Run the script: *python janssens.py*