https://github.com/aastopher/imgdd
Performance-first perceptual hashing library; perfect for handling large datasets. Designed to quickly process nested folder structures, commonly found in image datasets.
https://github.com/aastopher/imgdd
discrete-cosine-transform haar-wavelet-tranforms hashing image-hashing image-hashing-algorithms mathematics perceptual-hashing
Last synced: 2 months ago
JSON representation
Performance-first perceptual hashing library; perfect for handling large datasets. Designed to quickly process nested folder structures, commonly found in image datasets.
- Host: GitHub
- URL: https://github.com/aastopher/imgdd
- Owner: aastopher
- License: gpl-3.0
- Created: 2024-12-06T03:12:58.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-02-23T20:22:47.000Z (3 months ago)
- Last Synced: 2025-02-23T21:25:24.756Z (3 months ago)
- Topics: discrete-cosine-transform, haar-wavelet-tranforms, hashing, image-hashing, image-hashing-algorithms, mathematics, perceptual-hashing
- Language: Rust
- Homepage:
- Size: 6.57 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://aastopher.github.io/imgdd/)
[](https://pypi.org/project/imgdd)
[](https://crates.io/crates/imgdd)
[](https://crates.io/crates/imgddcore)
[](https://codecov.io/gh/aastopher/imgdd)
[](https://app.deepsource.com/gh/aastopher/imgdd/)# imgdd: Image DeDuplication
`imgdd` is a performance-first perceptual hashing library that combines Rust's speed with Python's accessibility, making it perfect for handling large datasets. Designed to quickly process nested folder structures, commonly found in image datasets.
## Features
- **Multiple Hashing Algorithms**: Supports `aHash`, `dHash`, `mHash`, `pHash`, `wHash`.
- **Multiple Filter Types**: Supports `Nearest`, `Triangle`, `CatmullRom`, `Gaussian`, `Lanczos3`.
- **Identify Duplicates**: Quickly identify duplicate hash pairs.
- **Simplicity**: Simple interface, robust performance.## Why imgdd?
`imgdd` has been inspired by [imagehash](https://github.com/JohannesBuchner/imagehash) and aims to be a lightning-fast replacement with additional features. To ensure enhanced performance, `imgdd` has been benchmarked against `imagehash`. In Python, [**imgdd consistently outperforms imagehash by ~60%–95%**](https://aastopher.github.io/imgdd/latest/benches), demonstrating a significant reduction in hashing time per image.
---
# Quick Start
## Installation
```bash
pip install imgdd
```## Usage Examples
### Hash Images
```python
import imgdd as ddresults = dd.hash(
path="path/to/images",
algo="dhash", # Optional: default = dhash
filter="triangle", # Optional: default = triangle
sort=False # Optional: default = False
)
print(results)
```### Find Duplicates
```python
import imgdd as ddduplicates = dd.dupes(
path="path/to/images",
algo="dhash", # Optional: default = dhash
filter="triangle", # Optional: default = triangle
remove=False # Optional: default = False
)
print(duplicates)
```## Supported Algorithms
- **aHash**: Average Hash
- **mHash**: Median Hash
- **dHash**: Difference Hash
- **pHash**: Perceptual Hash
- **wHash**: Wavelet Hash## Supported Filters
- `Nearest`, `Triangle`, `CatmullRom`, `Gaussian`, `Lanczos3`## Contributing
Contributions are always welcome! 🚀Found a bug or have a question? Open a GitHub issue. Pull requests for new features or fixes are encouraged!
## Similar projects
- https://github.com/JohannesBuchner/imagehash
- https://github.com/commonsmachinery/blockhash-python
- https://github.com/acoomans/instagram-filters
- https://pippy360.github.io/transformationInvariantImageSearch/
- https://www.phash.org/
- https://pypi.org/project/dhash/
- https://github.com/thorn-oss/perception (based on imagehash code, depends on opencv)
- https://docs.opencv.org/3.4/d4/d93/group__img__hash.html