https://github.com/razum2um/xxhashdir_comm
🏭 identifies common or duplicates across different hosts
https://github.com/razum2um/xxhashdir_comm
difference-detection duplicate-detection xxhash xxhashdir
Last synced: 7 months ago
JSON representation
🏭 identifies common or duplicates across different hosts
- Host: GitHub
- URL: https://github.com/razum2um/xxhashdir_comm
- Owner: razum2um
- Created: 2021-09-29T03:01:59.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-09-29T03:33:57.000Z (about 4 years ago)
- Last Synced: 2025-02-01T19:13:37.601Z (8 months ago)
- Topics: difference-detection, duplicate-detection, xxhash, xxhashdir
- Language: Rust
- Homepage:
- Size: 7.81 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# xxhashdir_comm
[](https://github.com/razum2um/xxhashdir_comm/actions/workflows/rust.yml)
## Problem
- Sometimes you did a backup by "hey, let's just rsync it somewhere"
- Now you struggle to merge such "backups" to save space across hosts and drives?This helps to identify common or duplicates across different hosts
using collected [xxhashdir](https://github.com/lunatic-cat/xxhashdir) results (plaintext in format: `\d{0,20} .*` with first column is `xxhash` checksum)## Howto
### Prepare files with checksums
```sh
# on remote host
xxhashdir . > remote.xxhashdir
# on local host
xxhashdir . > local.xxhashdir
scp remote:remote.xxhashdir remote.xxhashdir
```### Usage
```sh
# 🚀 to get common files (sources are mostly different)
# you likely want to know this to delete duplicates first, then copy rest
xxhashdir_comm --common local.xxhashdir remote.xxhashdir# 🚀 to get different files (sources are mostly equal)
# you likely want to know this to merge uniq files from second into first, then delete the second at all
xxhashdir_comm --only-second local.xxhashdir remote.xxhashdir
```## Why not _
- `rsync` can delete files on reciever, but relies only on filenames and mtime
- `fdupes` works only locally
- `zfs snapshot` + `zfs diff` is perfect but also only local and requires to be a common dataset initially
- incremental backups - you don't always bother to have
## Why use it- stdout can be reprocessed with sed/grep/whatever again
- unix way
- having fun with rust## Further plans
- Unify output with standart `comm` utility (columns & accept `-1/2/3`)
- Consider `xxhashdir` with bytesize input and compare bytesizes