https://github.com/jina-ai/executor-image-hasher

An executor to encode images using comparable hashing techniques. Useful for duplicate detection
https://github.com/jina-ai/executor-image-hasher

Last synced: 3 months ago
JSON representation

An executor to encode images using comparable hashing techniques. Useful for duplicate detection

Host: GitHub
URL: https://github.com/jina-ai/executor-image-hasher
Owner: jina-ai
Created: 2021-11-08T09:31:24.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-09-12T02:38:35.000Z (about 3 years ago)
Last Synced: 2025-03-07T03:46:28.966Z (7 months ago)
Language: Python
Homepage: https://github.com/jina-ai/executor-image-hasher
Size: 925 KB
Stars: 2
Watchers: 25
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ImageHasher

An executor to encode the images into embeddings using different comparable hashing technique, such as `perceptual`,
`average`, `wavelet`, and `difference` techniques. Features in the image are used to generate a distinct
(but not unique) fingerprint, and these fingerprints are comparable.

Comparable hashes are a different concept compared to cryptographic hash functions like `MD5` and `SHA1`. With
cryptographic hashes, the hash values are random. The data used to generate the hash acts like a random seed, so the
same data will generate the same result, but different data will create different results. Comparing two SHA1 hash
values really only tells you two things. If the hashes are different, then the data is different. And if the hashes are
the same, then the data is likely the same. In contrast, comparable hashes such as `perceptual` can be compared --
giving you a sense of similarity between the two data sets. All comparable hashes have the same basic properties:
images can be scaled larger or smaller, have different aspect ratios, and even minor coloring differences (contrast,
brightness, etc.) and they will still match similar images. A good usage of this is the detection of duplicate images.

ImageHasher receives the `Documents` with `tensor` attributes. The executor will encode the images into a vector of either
`dtype=np.uint8` or `dtype=np.bool` and store them in the `embedding` attribute of the `Document`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jina-ai/executor-image-hasher

Awesome Lists containing this project

README