https://github.com/dav-m85/hashsnap
Remove files that have duplicates elsewhere.
https://github.com/dav-m85/hashsnap
duplicates hash snapshot
Last synced: 3 months ago
JSON representation
Remove files that have duplicates elsewhere.
- Host: GitHub
- URL: https://github.com/dav-m85/hashsnap
- Owner: dav-m85
- License: mit
- Created: 2021-02-26T22:17:25.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-04-17T20:08:08.000Z (over 2 years ago)
- Last Synced: 2025-04-09T21:49:34.110Z (9 months ago)
- Topics: duplicates, hash, snapshot
- Language: Go
- Homepage:
- Size: 229 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
> Work in progress
# hsnap
Remove files ("trim") that have duplicates elsewhere.
By using space-savvy hash snapshots, *hsnap* can perform deduplication across
multiple filesystems without having them all accessed at the same time.
In a nutshell:
- Run ```hsnap create``` on some NAS
- Copy the generated ```.hsnap``` file on your localhost, rename it ```nas.hsnap```
- Run ```hsnap trim nas.hsnap -delete``` to remove all local files that have duplicates
on the NAS
Generated .hsnap files are small: 6GB of results in a ~360k hsnap file...
... and fast to generate: on a quadcore+SSD, it takes around 2min to hash said 6GB.
A Synology NAS hashes 100GB within 26min.
Duplication finding uses file size and SHA-1 hashing comparison, so don't expect security,
but fair level of guarantee that only duplicates are found.
## Usage
hsnap help
hsnap -help
Running on a NAS without screen/tmux and forgetting about it:
nohup hsnap... hsnap.log 2>&1 &
Exploring easily an info result:
hsnap trim nas.hsnap | less -R
## TODO
- be able to resume a hashing, by leveraging file existence,
- test extensively,
- dedup without trimming (perhaps, as fdupes does it already quite well),
- cleanup the code.
- Buffer .hsnap while creating in memory...
## Inspiration
- [mathieuancelin/duplicates](https://github.com/mathieuancelin/duplicates/blob/master/duplicates.go)
- [SeeSpotRun gist](https://gist.github.com/SeeSpotRun/456b88424841d7ae735f)
https://goreleaser.com/