An open API service indexing awesome lists of open source software.

https://github.com/infinisil/soph

Efficiently import pictures while handling duplicates gracefully
https://github.com/infinisil/soph

blockhash deduplication haskell perceptual-hashing pictures-organizer similarity-search

Last synced: 7 months ago
JSON representation

Efficiently import pictures while handling duplicates gracefully

Awesome Lists containing this project

README

          

* Soph

This is a simple utility to import pictures while handling duplicates gracefully.

Note that this is yet unstable. Current versions will almost certainly not work with future versions, due to differences in hash formatting (until I've added version migration support). This isn't a big problem however, because it's possible to just reimport all pictures.

** Usage

To import all pictures from directory ~imports~ to ~collection~:
#+BEGIN_SRC bash
$ soph imports collection
#+END_SRC

If similar images have been found after processing, a ~feh~ window will open with all of them (the new picture and the similar ones in the collection). Use the arrow keys to scroll through them and delete the one(s) you don't want by pressing ~~, then quit with ~q~.

Importing means: Copy the file into ~collection~ while giving it a hash-based filename. This allows a simple directory listing of ~collection~ to extract all this information. So in a way the filenames act as a database.

Here is an example run with one new image, one similar one and one exactly the same.
#+BEGIN_SRC
$ soph imports collection
[Info#init] Reading hashdir, decoding filenames and initializing database
[Info#process] Starting image processing of 3 files in import directory
[Info#process] New image at imports/002.jpg: 1 similar image(s) found
[Info#process] New image at imports/001.png: Already present as collection/b0ec08147fc1b495-0e02fe1b61760fa06703f87e8388780b01ff.png, removing the import file
[Info#process] New image at imports/003.png: New image, importing it
[Info#process] Importing new file imports/003.png into library to path collection/dea63899c760cc0b-f9f81f001c3980f81fc1fc07e0ce0ce0cecc.png
[Info#similars] Processing 1 similar images
[Info#process] New image at imports/002.jpg: 1 similar image(s) found
[Info#similar] File imports/002.jpg to import has 1 similar images:
[Info#similar] collection/3cc098ca1efed3c5-18f307b231870071ff0f20f17f01fb01f00f.png
[Info#similar] Opening them and the one to import in feh, delete the ones you don't want with , then quit feh with
[Info#process] Importing new file imports/002.jpg into library to path collection/6d384ae4863fc970-18f307b233870071df0f20f17f01fb01f00f.jpg
[Info] Finished import of 2 images
#+END_SRC

Note: The output is a bit misleading, since it reports the same picture twice. This is due to it actually first doing a pass through all images without asking the user for similar pictures first (it just skips them), then when that's done it does another pass through all the similar pictures with actually asking the user this time and doing the appropriate action. Really desirable on large imports. Output will be made cleaner in future versions.

** How it works

The above command will do the following
1. Do a file listing of ~collection~ to know previously imported images
2. Process every picture in ~imports~ (recursively)
- Calculate a content hash over it to know exact duplicates ([[https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function#FNV-1a_hash][Fnv1a hash]])
- Calculate a [[https://en.wikipedia.org/wiki/Perceptual_hashing][perceptual hash]] over it to be able to detect similar images ([[http://blockhash.io/][Blockhash]])
3. Import every picture in ~imports~ into ~collection~
- If the new picture is already present (same content hash), delete it
- If the new picture is not yet present, import it
- If the new picture has similar images present, open all of them in ~feh~, allowing you to delete the ones you don't want. If after that the new one wasn't deleted, import it.

** Installing

*** Nix (recommended)

The preferred way to install is with [[https://nixos.org/nix/][Nix]]:

#+BEGIN_SRC bash
$ nix-env -if .
#+END_SRC

Most derivations are cached by ~cache.nixos.org~ so it won't take too long to build.

This also automatically makes sure ~feh~ is available.

*** Stack

If you don't have Nix but [[https://haskell-lang.org/get-started][Stack]], you can (probably) install it with:

#+BEGIN_SRC bash
$ stack install
#+END_SRC

You also need to install ~feh~.