Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DonnoDK/fduf
CLI tool for finding duplicate files.
https://github.com/DonnoDK/fduf
Last synced: 12 days ago
JSON representation
CLI tool for finding duplicate files.
- Host: GitHub
- URL: https://github.com/DonnoDK/fduf
- Owner: DonnoDK
- Created: 2015-04-11T00:35:00.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-04-13T08:59:19.000Z (over 9 years ago)
- Last Synced: 2024-08-01T16:34:15.239Z (3 months ago)
- Language: C
- Size: 199 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Find Duplicate Files
====================
Or `fduf` for short, is a command-line tool for finding duplicate files.
`fduf` can either scan only the path given or recursively through
sub-directories.So, what's the status mister?
---------------------------
It works, with the following options available:- `-v` verbose mode, additional info on the scan (filesizes, total number of files
considered), and
- `-r` recursive scanning
- `-m` include a paths md5 hash value in the outputThe following options are considered for future versions:
- exclude directories with a given name or matching a regular expression, and
- options for formatting, making it easier to redirect output to other CLI
toolsThere's a, currently incomplete, manpage for `fduf` which should tell the same as above.
How does it work?
---------------------------
Something along the lines of$ fduf [options] [path]
with an example looking like
$ fduf -r ~/music
The example would search the current user's music stash recursively for duplicate files and output
the result to `stdout`Since speed is a priority, `fduf` works as follows:
- collect a set of filenames from the path specified (recursively with option -r),
- prune files which have a unique size,
- compute a relatively small and simple checksum for the remaining files,
- prune again files which have a unique checksum,
- compute an md5 hash for the remaining files
- and finally prune files with a unique md5 hashThe resulting set of files is sorted based on size and paths are printed to
`stdout` as mentioned.The path for files within a _set_ of duplicates are printed seperated by a newline character (`\n`), and each set is seperated by an additional newline character, such as
$ fduf -r ~/music/rhcp
~/music/rhcp/some_greatest_album/walkabout.flac
~/music/rhcp/one hot minute/walkabout.flac
~/music/rhcp/some_greatest_album/aeroplane.flac
~/music/rhcp/one hot minute/aeroplane.flac
$
_(~~Sony~~ Warner Bros., please don't sue me - it's just an example!)_So what needs work?
---------------------------
- The makefile,
- the manpage,
- formatting options, and
- option for excluding directories matching some given name(s)Suggestions, critique, pull requests and other contributions are welcome.
Similar work
---------------------------
- https://github.com/adrianlopezroche/fdupes - Very extensive with many options. Can also delete duplicates.
- https://github.com/peturingi/einfalda - A Java implementation.
- https://github.com/phiresky/dupegone - The projects README also presents some interesting results w.r.t. speed and approach.
- https://github.com/sahib/rmlint - Currently the fastest (that I know of and save for the bash-script mentioned in the above project) duplicate files finder.
- https://github.com/cpbills/tools/blob/master/smart-dupe - Written in Perl.
- http://rdfind.pauldreik.se - A relatively older tool, available in most package managers. Depends on the [Nettle](http://www.lysator.liu.se/%7Enisse/nettle/) library.