{"id":17871737,"url":"https://github.com/chadnetzer/hardlinkable","last_synced_at":"2025-07-07T22:04:06.028Z","repository":{"id":57566218,"uuid":"153365189","full_name":"chadnetzer/hardlinkable","owner":"chadnetzer","description":"A tool to scan directories and report on the space that could be saved by hardlinking identical files.  It can also perform the linking.  Written in Go.","archived":false,"fork":false,"pushed_at":"2018-10-29T05:36:03.000Z","size":458,"stargazers_count":8,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-18T03:51:31.767Z","etag":null,"topics":["cmd","comparison","compression","file","files","filesystem","go","hardlink","hardlinking","posix","progress","tools","unix"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chadnetzer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-16T22:59:59.000Z","updated_at":"2024-07-29T23:21:47.000Z","dependencies_parsed_at":"2022-08-27T18:30:13.825Z","dependency_job_id":null,"html_url":"https://github.com/chadnetzer/hardlinkable","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chadnetzer%2Fhardlinkable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chadnetzer%2Fhardlinkable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chadnetzer%2Fhardlinkable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chadnetzer%2Fhardlinkable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chadnetzer","download_url":"https://codeload.github.com/chadnetzer/hardlinkable/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244849661,"owners_count":20520760,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cmd","comparison","compression","file","files","filesystem","go","hardlink","hardlinking","posix","progress","tools","unix"],"created_at":"2024-10-28T10:37:41.252Z","updated_at":"2025-03-21T18:33:31.866Z","avatar_url":"https://github.com/chadnetzer.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"## `hardlinkable` - Find and optionally link identical files\n\n`hardlinkable` is a tool to scan directories and report files that could be hardlinked together because they have identical content, and (by default) other matching criteria such as modification time, permissions and ownership.  It can optionally perform the linking as well, saving storage space (but by default, it only reports information).\n\nThis program is faster and with more accurate reporting of results than other variants that I have tried.  It works by gathering full inode information before deciding what action (if any) to take.  Full information allows it to produce exact reporting of what will happen before any modifications occur.  It also uses content information from previous comparisons to drastically reduce search times.\n\n---\n## Help\n```\n$ hardlinkable --help\nA tool to scan directories and report on the space that could be saved\nby hardlinking identical files.  It can also perform the linking.\n\nUsage:\n  hardlinkable [OPTIONS] dir1 [dir2...] [files...]\n\nFlags:\n  -v, --verbose           Increase verbosity level (up to 3 times)\n      --no-progress       Disable progress output while processing\n      --json              Output results as JSON\n      --enable-linking    Perform the actual linking (implies --quiescence)\n  -f, --same-name         Filenames need to be identical\n  -t, --ignore-time       File modification times need not match\n  -p, --ignore-perm       File permission (mode) need not match\n  -o, --ignore-owner      File uid/gid need not match\n  -x, --ignore-xattr      Xattrs need not match\n  -c, --content-only      Only file contents have to match (ie. -potx)\n  -s, --min-size N        Minimum file size (default 1)\n  -S, --max-size N        Maximum file size\n  -i, --include RE        Regex(es) used to include files (overrides excludes)\n  -e, --exclude RE        Regex(es) used to exclude files\n  -E, --exclude-dir RE    Regex(es) used to exclude dirs\n  -d, --debug             Increase debugging level\n      --ignore-walkerr    Continue on file/dir read errs\n      --ignore-linkerr    Continue when linking fails\n      --quiescence        Abort if filesystem is being modified\n      --disable-newest    Disable using newest link mtime/uid/gid\n      --search-thresh N   Ino search length before enabling digests (default 1)\n  -h, --help              help for hardlinkable\n      --version           version for hardlinkable\n```\n\nThe include/exclude options can be given multiple times to support multiple regex matches.\n\n`--debug` outputs additional information about program state in the final stats and the progress information.\n\n`--ignore-walkerr` allows the program to skip over unreadable files and directories, and continue with the information gathering.\n\n`--ignore-linkerr` allows the program to skip any links that cannot be made due to permission problems or other errors, and continue with the processing.  It is only applicable when linking is enabled, and should be used with caution.\n\n`--quiescence` checks that the files haven't changed between the initial scan and the attempt to link (such as filesizes or timestamps changing), etc.  This would suggest they are being modified, and the program stops when this is detected.  Specifying `--quiescence` during a normal scan, where linking is not enabled, will perform these checks anyway at a small performance cost.\n\n`--disable-newest` will turn off the default behavior of attempting to set the src inode to the most recent modification time of the linked inodes, and also change the uid/gid to those of the more recent inode.  This behavior can be useful for backup programs, so that they see inodes as being newer, and will back them up.  Only applicable when linking is enabled.\n\n`--search-thresh` can be set to (-1) to disable the use of digests, which may save a small amount of memory (at the cost of possibly many more comparisons done).  Otherwise this controls the length that inode hashes must grow to before enabling the use of digests.  Safe to ignore, this option will not affect results, only possibly the time required to complete a run.\n\n---\n## Example output\n```\n$ hardlinkable download_dirs\nHard linking statistics\n-----------------------\nDirectories               : 3408\nFiles                     : 89177\nHardlinkable this run     : 2462\nRemovable inodes          : 2462\nCurrently linked bytes    : 23480519   (22.393 MiB)\nAdditional saveable bytes : 245927685  (234.535 MiB)\nTotal saveable bytes      : 269408204  (256.928 MiB)\nTotal run time            : 4.691s\n```\n\nAdditional verbosity levels will provide additional stats, a list of linkable files, and previously linked files:\n\n```\n$ hardlinkable -vvv download_dirs\nCurrently hardlinked files\n--------------------------\nfrom: download_dir/testfont/BlackIt/testfont.otf\n  to: download_dir/testfont/BoldIt/testfont.otf\n  to: download_dir/testfont/ExtraLightIt/testfont.otf\n  to: download_dir/testfont/It/testfont.otf\nFilesize: 4.146 KiB  Total saved: 12.438 KiB\n...\n\nFiles that are hardlinkable\n-----------------------\nfrom: download_dir/bak1/some_image1.png\n  to: download_dir/bak2/some_image1.png\n...\nfrom: download_dir/fonts1/some_font.otf\n  to: download_dir/other_fonts1/some_font.otf\n\nHard linking statistics\n-----------------------\nDirectories                 : 3408\nFiles                       : 89177\nHardlinkable this run       : 2462\nRemovable inodes            : 2462\nCurrently linked bytes      : 23480519   (22.393 MiB)\nAdditional saveable bytes   : 245927685  (234.535 MiB)\nTotal saveable bytes        : 269408204  (256.928 MiB)\nTotal run time              : 4.765s\nComparisons                 : 21479\nInodes                      : 80662\nExisting links              : 8515\nTotal old + new links       : 10977\nTotal too small files       : 71\nTotal bytes compared        : 246099717  (234.699 MiB)\nTotal remaining inodes      : 78200\n```\n\nA more detailed breakdown of the various stats can be found in the [Results.md](Results.md).\n\n---\n## Methodology\n\nThis program is named `hardlinkable` to indicate that, by default, it does *not* perform any linking, and the user has to explicitly opt-in to having it perform the linking step.  This (to me) is a safer and more-sensible default than the alternatives; it's not unusual to want to run it a few times with different options to see what would result, before actually deciding whether to perform the linking.\n\nThe program first gathers all the information from the directory and file walk, and uses this information to execute a linking strategy which minimizes the number of moved links required to reach the final state.\n\nBesides having more accurate statistics, this version can be significantly faster than other versions, due to opportunistically keeping track of simple file content hashes as the inode hash comparison lists grow.  It computes these content hashes at first only when comparing files (when the file data will be read anyway), to avoid unnecessary I/O.  Using this data and quick set operations, it can drastically reduce the amount of file comparisons attempted as the number of walked files grows.\n\n---\n## History\n\nThere are a number of programs that will perform hardlinking of identical files, and both Redhat and Debian/Ubuntu each include a `hardlink` program, with different implementation and capabilities.  The Redhat variant is based upon `hardlink.c` originally written by Jakub Jelinek, which later inspired John Villalovos to write his own version in Python, now known as `hardlinkpy` with multiple additional contributors (Antti Kaihola, Carl Henrik Lunde, et al.)  The Python version inspired Julian Andres Klode to do yet another re-implementation in C, which also added support for Xattrs.  There are numerous other variants floating around as well.\n\nThe previous versions that I've encountered do the hardlinking while walking the directory tree, before gathering complete information on all the inodes and pathnames.  This tends to lead to inaccurate statistics reported during a \"dry run\", and can also cause links to be needlessly moved from inode to inode multiple times during a run.  They also don't use \"dry run\" mode as the default, so you have to remember to enable \"dry run\" if you just want to play with different options, or find information on the amount of duplicate files that exist.\n\nThis version is written in Go and incorporates ideas from previous versions, as well as it's own innovations, to ensure exactly accurate results when in \"dry run\" mode and actual linking mode.  I expect and intend for it to be the fastest version, due to avoiding unnecessary I/O, minimizing extraneous searches and comparisons, and because it never moves a link more than once during a run.\n\n---\n## Contributing\n\nContributions are welcome, including bug reports, suggestions, and code patches/pull requests/etc.  I'm interested in hearing what you use `hardlinkable` for, and what could make it more useful to you.  If you've used other space-recovery hardlinking programs, I'm also interested to know if `hardlinkable` bests them in speed and report accuracy, or if you've found a regression in performance or capability.\n\n## Build\n\n```\ngo test ./...\ngo test -tags slowtests ./...  # Could take a minute\ngo install ./...  # installs to GOPATH/bin\n\nor\n\ncd cmd/hardlinkable \u0026\u0026 go build  # builds in cmd/hardlinkable\n```\n\n## Install `hardlinkable` command\n```\ngo get github.com/chadnetzer/hardlinkable/cmd/hardlinkable\n```\n\n## License\n\n`hardlinkable` is released under the MIT license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchadnetzer%2Fhardlinkable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchadnetzer%2Fhardlinkable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchadnetzer%2Fhardlinkable/lists"}