{"id":16138012,"url":"https://github.com/glehmann/hld","last_synced_at":"2025-03-16T09:33:15.248Z","repository":{"id":48315094,"uuid":"162129428","full_name":"glehmann/hld","owner":"glehmann","description":"Hard Link Deduplicator","archived":false,"fork":false,"pushed_at":"2025-03-10T11:03:18.000Z","size":324,"stargazers_count":8,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-10T11:34:34.092Z","etag":null,"topics":["dedup","deduplication","hardlinks","reflinks","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/glehmann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-17T12:45:46.000Z","updated_at":"2025-03-10T10:39:06.000Z","dependencies_parsed_at":"2024-07-24T17:00:45.555Z","dependency_job_id":"b9480a2f-4341-4ad5-b907-d36f58b8b2c2","html_url":"https://github.com/glehmann/hld","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glehmann%2Fhld","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glehmann%2Fhld/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glehmann%2Fhld/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glehmann%2Fhld/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/glehmann","download_url":"https://codeload.github.com/glehmann/hld/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243809844,"owners_count":20351406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dedup","deduplication","hardlinks","reflinks","rust"],"created_at":"2024-10-09T23:31:17.673Z","updated_at":"2025-03-16T09:33:14.830Z","avatar_url":"https://github.com/glehmann.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"Hard Link Deduplicator\n======================\n\n`hld` finds the duplicated files and hardlinks them together in order to save\nsome disk space. And it's made to be fast!\n\nHere is an example session on a modern (2017) laptop:\n\n```fish\n$ du -sh myproject ~/.m2\n896M    myproject\n912M    .m2\n$ time hld -r -c ~/.m2 myproject\n420.23 MB saved in the deduplication of 675 files\nreal 0.47\nuser 1.17\nsys 0.22\n```\n\n420MB — 46% of the build directory size — saved in just 0.5 seconds :-)\n\n[![CI Status](https://github.com/glehmann/hld/actions/workflows/ci.yml/badge.svg)]([text](https://github.com/glehmann/hld/actions))\n\nFeatures\n--------\n\nIt works with all the available core by default and uses the [BLAKE3](https://blake3.io/)\nhashing function in order to be both very fast and with an extremely low\nchance of collision.\n\nBecause of its caching feature, it is an efficient way to deduplicate files\nthat might have been copied by some automated process — for example a maven\nbuild.\n\nUsage\n-----\n\n#### globs\n\n`hld` takes a set of globs as argument. The globs are used to find the\ncandidate files for deduplication. They support the `**` notation to traverse\nany number of directories. For example:\n\n* `hld \"target/*.jar\"` deduplicates all the `jar` files directly in the `target`\n  directory;\n* `hld \"target/**/*.jar\"` deduplicates all the `jar` files in the `target`\n  directory and its subdirectories.\n\nSeveral globs may be passed on the command line in order to work with\nseveral directories and/or several file name patterns. For example:\n`hld \"target/*.jar\" \"images/**/*.png\"`.\n\nNote: the quotes are important to avoid the glob expansion by the shell.\nIn case of large directories, the shell may not be able to pass all the\nfiles contained there.\n\n#### caching\n\nIn addition to the raw globs of the previous chapter, some cached globs may\nbe used. They act all the same than the raw globs, but their BLAKE3 digest\nvalue is saved for a latter reuse. They must be used on files that are\nguarenteed to *not* change. Cached globs are passed with a `--cache`,\nor `-c` option.\n\nFor example: `hld \"target/*\" --cache \"stable/*\"` will deduplicate\nall the files in both `target` and `stable`, and will also cache the\ndigests of the files in `stable`. The cached digests of `stable` will\nthen be reused at a latter `hld` call, in order to speed up the execution.\n\nThe quotes are very important in this case: without them, the globs would\nbe expanded by the shell, and only the first file of the set would be\ncached.\n\nThe cache path may be specified with the `--cache-path` option or `-C`,\nin order to deal with several sets of caches, depending on the execution\ncontext.\n\nThe cache may be cleared with the option `--clear-cache`.\n\n#### recursive\n\nThe `--recursive` or `-r` option simplify the command line usage when working\nwith all the files in some directories. For example, the two following\ncommands are strictly equivalents:\n\n```fish\nhld -r -c ~/.m2 myproject\n```\n\n```fish\nhld -c \"$HOME/.m2/**/*\" \"myproject/**/*\"\n```\n\n#### dry run\n\nUsing the option `--dry-run` or `-n` prevents `hld` to modify anytring on\nthe disk, cache included.\n\nFor example: `hld \"target/*\" --cache \"stable/*\" --dry-run` only show how many\nfiles would be deduplicated and how much space would be saved, but actually\ndoes nothing.\n\n#### log level\n\nThe amount of output displayed by `hld` can be controlled by the `--log-level`\nor `-l` option. It accepts the following values, from the most verbose to\nthe most quiet: `trace`, `debug`, `info` (the default level), `warn`, `error`.\n\n#### parallelism\n\nBy default `hld` maximize the number of cores it is working on, in order to\ncomplete its task as fast of possible. The `--parallel` or `-j` options let\nyou change the number of threads to run in parallel.\n\nFor example, `hld -j1 \"myproject/*\"` forces `hld` to run single threaded.\n\n#### shell completion\n\n`hld` can generate the completion code for several shells (fish, zsh, bash, …).\nJust run it with the `--completion` option followed by the shell type, and save\nthe produce code in the appropriate location. For example, for fish:\n\n```fish\nhld --completion fish \u003e ~/.config/fish/completions/hld.fish\n```\n\nThe completion is usually activated in the new shell instances, but may be\nactivated by sourcing the file. Again for fish:\n\n```fish\nsource ~/.config/fish/completions/hld.fish\n```\n\nInstall\n-------\n\n`hld` is currently only available from sources. To install it, you need\na [Rust installation](https://www.rust-lang.org/). `hld` compiles with rust\nstable or newer. In general, `hld` tracks the latest stable release of the\nRust compiler.\n\n```\n$ git clone https://github.com/glehmann/hld\n...\n$ cd hld\n$ cargo install\n...\n$ $HOME/.cargo/bin/hld --version\nhld 0.1.0\n```\n\nBuilding\n--------\n\nYou need a [Rust installation](https://www.rust-lang.org/). `hld` compiles\nwith rust stable or newer. In general, `hld` tracks the latest stable release\nof the Rust compiler.\n\nTo build `hld`:\n\n```\n$ git clone https://github.com/glehmann/hld\n...\n$ cd hld\n$ cargo build --release\n...\n$ ./target/release/hld --version\nhld 0.1.0\n```\n\nTesting\n-------\n\nTo run the full test suite, use:\n\n```\n$ cargo test\n...\ntest result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out\n\n```\n\nfrom the repository root.\n\nReleasing\n---------\n\nIn order to produce a small easy to download executable, just do a release\nbuild followed by:\n\n```\n$ strip target/release/hld\n$ upx --ultra-brute target/release/hld\n```\n\nCode coverage\n-------------\n\nThe code coverage may be computed with [kcov](https://simonkagstrom.github.io/kcov/).\nMake sure the `kcov` executable is in the `PATH` then run:\n\n```fish\n$ cargo test --features kcov -- --test-threads 1\n```\n\nThe report is available in `target/x86_64-unknown-linux-gnu/debug/coverage/index.html`.\n\nTODO\n----\n\n* factorize the computation of the digest in the cached and non cached files\n* which duplicate do we keep when symlinking? The first one? From the caches if possible?\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglehmann%2Fhld","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglehmann%2Fhld","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglehmann%2Fhld/lists"}