Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/peter-sk/photosdup
Mac Photos Duplicate Finder
https://github.com/peter-sk/photosdup
Last synced: 14 days ago
JSON representation
Mac Photos Duplicate Finder
- Host: GitHub
- URL: https://github.com/peter-sk/photosdup
- Owner: peter-sk
- License: mit
- Created: 2022-07-02T13:52:10.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-12T04:14:42.000Z (almost 2 years ago)
- Last Synced: 2024-10-03T06:47:41.772Z (about 1 month ago)
- Language: Python
- Size: 27.3 KB
- Stars: 12
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Photos Duplicates (photosdup)
**Tool to scan a (Mac) Photos library for duplicates, thumbnails etc.**
The photosdup tool is available from [PyPI](https://pypi.org/project/photosdup/1.0/):
```
pip install photosdup
```## NEW in 3.1
This version can also scan arbitrary directories with images that are not in a Mac Photos library. In this case, the --no-thumbs option is assumed and the --tag options is ignored.The newest version of Photos Duplicates supports all the image formats of OpenCV plus the HEIC format produced by iPhones.
## Command line interface
After installation, photosdup can be run by providing it with the directory of the Photos library you would like to scan.```
python -m photosdup Pictures/Photos.photoslibrary
```
By default, the command line version prints a list of lists of images, with each interior list representing an original as its first element and its likely duplicates as the remaining elements.By default, photsdup uses the thumbnails pre-computed by Photos. If this causes any issues or you want to use larger scales, you scan scale originals instead:
```
python -m photosdup Pictures/Photos.photoslibrary --no-thumbs```
To tag originals and duplicates with keywords and create albums for them instead, you can add the following parameter:
```
python -m photosdup Pictures/Photos.photoslibrary --tag
```## Use as a library
You can instantiate a DuplicateFinder object by providing the path of the library to scan and using the scan convenience function:
``
from photosdup import DuplicateFinder
df = DuplicateFinder("Picturs/Photos Library.photoslibrary")
print(df.scan())
``
Finer control is available through the functions load, represent, find, and tag. See the implementation of scan for the typical usage.## Graphical user interface (experimental)
There is an experimental graphical user interface, which attempts to aid in locating the Photos library and set the parameters (see next section).
```
python -m photosdup --gui
```
The graphical user interface always tags originals and duplicates with keywords and creates albums for them.## Parameters
Several parameters such as the dimensions of the scaled down image used for comparison can be provided in both the command line and the graphical user interface. For an explanation and overview, just use the help function.
```
python -m photosdup --help
```
If the graphical user interface has stability problems, force single core code using 0 for the cores parameter.## Result of scan
The result of the scan is stored by photosdup in two ways:
1. each time a duplicate is found, the higher-quality image (as judged by total file size) is tagged with the keyword photosdup-duplicate while the lower-quality duplicates are tagged with the keyword photosdup-original.
2. each set of original and duplicates is tagged with the UUID of the original and put in an album called photosdup-UUID.## Related work and approach
The approach used for scaling images is inspired by the approach taken in difPy. Unfortunately, difPy could not be used as it does not integrate with the Photos database (minor nuisance regarding updating the database) and uses a quadratic algorithm that compares each image to all other images, i.e., N*(N-1) comparisons for N images. Nevertheless, the approach described here was a great inspiration:
https://towardsdatascience.com/finding-duplicate-images-with-python-71c04ec8051The search for duplicates and near duplicates uses a radius query on a KD tree. The SciPy implementation was used as it supports parallelization via multiple threaded workers:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_pointLast, but not least, iterative deepening in the form of lower-resolution scanning is used to eliminate likely non-duplicates from the more costly higher-resolution scanning. This aspect can be controlled by the --xdims, --ydims, and --radiuses parameters (cf. the output of --help).