https://github.com/muesli/xray
xray compares media files by their perceptual hash and identifies dupes
https://github.com/muesli/xray
hacktoberfest
Last synced: about 1 month ago
JSON representation
xray compares media files by their perceptual hash and identifies dupes
- Host: GitHub
- URL: https://github.com/muesli/xray
- Owner: muesli
- License: mit
- Created: 2014-01-22T06:29:55.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2014-02-13T07:41:00.000Z (over 11 years ago)
- Last Synced: 2025-03-31T03:11:42.610Z (3 months ago)
- Topics: hacktoberfest
- Language: C++
- Size: 344 KB
- Stars: 16
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog
- License: LICENSE
Awesome Lists containing this project
README
xray
====xray compares media files by their perceptual hash and identifies dupes.
This means files are not compared byte-for-byte, but by their visual content.
It will (or at least it should) find duplicates of videos, even when they are
encoded in different formats, bitrates and/or resolutions.## What it does
xray scans a given directory. Each video file it finds gets analyzed by taking
several snapshots of the video content. Those images are then p-hashed and
compared with all other videos it found. If xray considers two videos similar
enough it will output that it found a dupe and additionally shows you a
similarity score.In case xray thinks it found a perfect match, it will also calculate the sha1sum
of both videos to safely identify exact copies of a file.## Dependencies
xray depends on QtCore (Qt >= 5.2), ffmpeg and phash. You should be able to find
existing packages for your system. On Ubuntu install "libphash0-dev", on Arch
install "phash" from AUR, on OS X "brew install phash".## Build it
qmake xray.pro
make## Try it out
./xray /media/videos## Plans
So far this is just a working proof-of-concept. Plans to enhance & improve xray:
- CLI params for consts, like thresholds, hamming distance etc.
- Center-crop and re-scale frames to eliminate issues with borders and aspect ratios.
- Combine individual frame hashes into one big file hash.
- Compare video frames and when finding a reasonably close one, do a close inspection comparing more of the surrounding frames of both files.
- Store hashes in a file (sqlite/unqlite/textfile) to speed up future comparisons.
- Offer to automatically quarantine dupe files.
- Provide an interactive mode.
- Drop external snapshot process. Directly use libvlc or ffmpeg or such.Enjoy!