https://github.com/theohbrothers/get-duplicatephotos
A script to locate duplicate image files between two sets of folders (e.g. Camera Roll folders vs other folders).
https://github.com/theohbrothers/get-duplicatephotos
criteria date-taken duplicate-files duplicate-photos duplicates export file-metadata powershell pwsh script search
Last synced: 7 months ago
JSON representation
A script to locate duplicate image files between two sets of folders (e.g. Camera Roll folders vs other folders).
- Host: GitHub
- URL: https://github.com/theohbrothers/get-duplicatephotos
- Owner: theohbrothers
- License: mit
- Created: 2021-08-16T23:24:33.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-08-27T06:54:37.000Z (about 4 years ago)
- Last Synced: 2025-01-25T14:29:07.781Z (9 months ago)
- Topics: criteria, date-taken, duplicate-files, duplicate-photos, duplicates, export, file-metadata, powershell, pwsh, script, search
- Language: PowerShell
- Homepage:
- Size: 36.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Get-DuplicatePhotos
[](https://github.com/theohbrothers/Get-DuplicatePhotos/actions)
[](https://github.com/theohbrothers/Get-DuplicatePhotos/releases/)A script to locate duplicate image files between two sets of folders (e.g. Camera Roll folders vs other folders).
It is fairly common, for instance, to copy image files from your Camera Roll, and then leave traces of those edited copies around after modifying them. This script helps to locate those duplicates, assuming that they still have their original the image metadata (i.e. `Date Taken`).
## How it works
- The default duplicate criteria is to match only by `Date Taken` (e.g. `2021-01-01T00:11:22+0000`).
- Choose whether the duplicate criteria should also include file size and file hash.
- Searches two groups of folders (i.e. source and other) for all descendent image files with an existing `Date Taken` attribute.
- Compares files of the two groups of folders, identifying duplicates using the criteria you defined
- Finally, exports duplicates into a `duplicates.json` file.## `duplicates.json`
For DateTaken-only criteria, the key is `DateTaken`, where `DateTaken` is in [`ISO 8601`](https://www.iso.org/iso-8601-date-and-time-format.html) format.
```json
{
"2021-01-01T00:11:22+0000": [
"C:\\path\\to\\Camera Roll\\source.jpg", // The first file is the source file.
"C:\\path\\to\\other folder\\duplicate.jpg", // The rest are duplicates.
...
],
...
}
```For DateTaken, length, and file hash criteria, the key is `DateTaken-Length-FileHash`, where `DateTaken` is in [`ISO 8601`](https://www.iso.org/iso-8601-date-and-time-format.html) format, `Length` is a integer in bytes, and `FileHash`is an `SHA256` hash value.
```json
{
"2021-01-01T00:11:22+0000-1234567-XXXXXXXXXX": [
"C:\\path\\to\\Camera Roll\\source.jpg", // The first file is the source file.
"C:\\path\\to\\other folder\\duplicate.jpg", // The rest are duplicates.
...
],
...
}
```