{"id":50762628,"url":"https://github.com/curlewlabs-com/apfs-dedupe","last_synced_at":"2026-06-11T11:30:21.892Z","repository":{"id":360569154,"uuid":"1249642563","full_name":"curlewlabs-com/apfs-dedupe","owner":"curlewlabs-com","description":"Safe APFS clone deduplication for macOS — fclones detection + a windowless, ACL-complete apply","archived":false,"fork":false,"pushed_at":"2026-05-27T01:38:28.000Z","size":117,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T02:12:43.894Z","etag":null,"topics":["apfs","cli","clonefile","command-line-tool","copy-on-write","dedupe","deduplication","disk-space","disk-usage","fclones","filesystem","macos","reflink","storage"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/curlewlabs-com.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-25T23:08:52.000Z","updated_at":"2026-05-27T01:38:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/curlewlabs-com/apfs-dedupe","commit_stats":null,"previous_names":["curlewlabs-com/apfs-dedupe"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/curlewlabs-com/apfs-dedupe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curlewlabs-com%2Fapfs-dedupe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curlewlabs-com%2Fapfs-dedupe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curlewlabs-com%2Fapfs-dedupe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curlewlabs-com%2Fapfs-dedupe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/curlewlabs-com","download_url":"https://codeload.github.com/curlewlabs-com/apfs-dedupe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curlewlabs-com%2Fapfs-dedupe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34197393,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apfs","cli","clonefile","command-line-tool","copy-on-write","dedupe","deduplication","disk-space","disk-usage","fclones","filesystem","macos","reflink","storage"],"created_at":"2026-06-11T11:30:20.757Z","updated_at":"2026-06-11T11:30:21.884Z","avatar_url":"https://github.com/curlewlabs-com.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# apfs-dedupe\n\n[![CI](https://github.com/curlewlabs-com/apfs-dedupe/actions/workflows/ci.yml/badge.svg)](https://github.com/curlewlabs-com/apfs-dedupe/actions/workflows/ci.yml)\n\nReclaim disk on macOS by replacing byte-identical duplicate files with **APFS\nclones** — independent files that share storage copy-on-write until one is modified.\n\nFinding duplicates is a solved problem — [`fclones`](https://github.com/pkolaczk/fclones)\ndoes it well, so this tool delegates detection to it. The hard part is the **apply**,\nand that's the point of `apfs-dedupe`: it is **crash-safe** (no instant where the file\nis missing, even if the process dies mid-swap), **re-verifies the bytes** against a\nfrozen clone immediately before replacing, stays **symlink- and TOCTOU-safe** when run\nas root over user-writable directories, and preserves a file's metadata in full —\n**including its ACLs**, which other reflink dedupers drop.\n\n**Dry-run is the default. It shows what it would reclaim and changes nothing — you\nopt into changes with `--apply`.**\n\n## Quick start\n\n```sh\nbrew install fclones                      # the one dependency\ngit clone https://github.com/curlewlabs-com/apfs-dedupe.git\ncd apfs-dedupe\n./apfs-dedupe.sh --scope ~/Projects       # dry-run: shows what it would reclaim\n```\n\n```\nScanning /Users/you/Projects (files \u003e= 1M) with fclones...\nfclones: Found 3 (7.3 MB) redundant files\n\nDRY RUN -- nothing changed. Re-run with --apply to reclaim the space below.\n\nwould clone /Users/you/Projects/webapp/node_modules/ui-kit/bundle.js -\u003e /Users/you/Projects/webapp-backup/node_modules/ui-kit/bundle.js  (4.0 MiB allocated (4.0 MiB logical))\nwould clone /Users/you/Projects/webapp/assets.tar -\u003e /Users/you/Projects/webapp-backup/assets.tar  (2.0 MiB allocated (2.0 MiB logical))\nwould clone /Users/you/Projects/webapp/icon.png -\u003e /Users/you/Projects/webapp-backup/icon.png  (1.0 MiB allocated (1.0 MiB logical))\nscanned: 412 files in 2s\nwould reclaim: 7.0 MiB allocated (7.0 MiB logical) across 3 files\n```\n\nLike the plan? Add `--apply` to do it (requires macOS 15+):\n\n```sh\n./apfs-dedupe.sh --apply --scope ~/Projects\n```\n\nTo sweep every account on the machine, run it as root over the default `/Users`\nscope: `sudo ./apfs-dedupe.sh` (still a dry-run), then `sudo ./apfs-dedupe.sh --apply`.\n\n## Why not just `fclones dedupe`?\n\n[`fclones`](https://github.com/pkolaczk/fclones) can do the dedupe apply too\n(`fclones dedupe`); on macOS that path has two gaps:\n\n- **A vanishing-file window.** `fclones dedupe` renames the original aside, *then*\n  clones over it — so the path briefly has no file, and stays that way if the\n  process dies between the two steps.\n- **Dropped ACLs.** `clonefile` copies the *source's* metadata; dedupers restore\n  POSIX bits / owner / times / xattrs but lose the file's ACLs.\n\n`apfs-dedupe` does the apply itself, correctly — every step relative to a file\ndescriptor for the duplicate's parent directory, so no path is re-resolved\nmid-apply (see [Safety](#safety) for why):\n\n```\ndirfd = open(dirname(dup), O_NOFOLLOW_ANY)        # no symlink in any component\nclonefileat(canonical -\u003e dirfd:tmp)               # shared extents; CLONE_NOFOLLOW_ANY\ncompare(tmp, dup)                                  # current bytes still match\nfcopyfile(dup_fd -\u003e tmp_fd, COPYFILE_METADATA)    # dup's mode/owner/times/xattrs/ACLs\nrenameat(dirfd: tmp -\u003e dup)                        # atomic: dup's path is never absent\n```\n\nThat `fcopyfile(COPYFILE_METADATA)` is what carries ACLs across; the\ntemp-then-atomic-rename removes the window and is crash-safe — a crash leaves the\noriginal untouched.\n\n## Deduping all of /Users, across every user\n\nRunning as root over the whole machine is the intended use, and it is safe. The\napply restores each *replaced* file's own owner, group, mode, and ACL — the\ncanonical file is never modified — so every file keeps its identity and the only\nthing that ever crosses a user boundary is the shared bytes, which were identical\nto begin with. Because the clone is copy-on-write, a later write by any user\ndiverges that file instead of touching anyone else's, and access stays gated by\neach file's own permissions (sharing storage grants no new read access). It does\nnot even matter which file in a group is chosen as the canonical. `clonefile` is\nsame-volume only, and all of `/Users` lives on one APFS volume.\n\n`/` and the writable data-volume root are refused unless you pass `--allow-root`,\nand the macOS system volume is a sealed read-only snapshot — so Apple's system\nfiles can't be modified in any case (they would simply error and be skipped).\n\n## Safety\n\n- **Dry-run by default.** Nothing changes until you pass `--apply`.\n- **Content re-verify against a frozen clone, right before the swap.** `clonefile`\n  is not content-verified by the kernel (unlike Linux `FIDEDUPERANGE`), so we clone\n  first — a copy-on-write snapshot — then re-compare *that* against the duplicate\n  immediately before replacing. It never installs bytes that went stale since the\n  scan (seconds to minutes earlier on a large run), a concurrent write to the\n  original can't affect the result, and this check is always on.\n- **Windowless, crash-safe** apply (above).\n- **Symlink-safe, fd-anchored apply.** Running as root over user-writable\n  directories, the clone, content re-verify, metadata copy, and atomic swap are\n  all done relative to a file descriptor for the duplicate's parent — acquired\n  with `O_NOFOLLOW_ANY` (no symlink in any path component) and then used via\n  `clonefileat`/`openat`/`renameat` — so a local user can't swap a path\n  component to redirect a root clone into an arbitrary location.\n- **Fails safe.** Immutable (`uchg`), deny-ACL, or permission-denied files are\n  skipped with a warning, never forced.\n- **`--one-fs`** so it never crosses volumes into a silent full copy.\n- **Never breaks hard links.**\n- **Reports allocated space separately from logical bytes.** Sparse and\n  APFS-compressed files can be much larger logically than the blocks they occupy,\n  so summaries lead with the estimated allocated bytes reclaimed and keep the\n  logical duplicate byte count in parentheses.\n- **Skips files already cloned on a re-run.** A second sweep compares physical\n  extents to detect duplicates that already share storage, leaves them untouched\n  instead of rebuilding the same clone, and reports the space earlier runs already\n  saved (`already saved by earlier clones: …`). Re-running is cheap and changes\n  nothing once a tree is deduped — see\n  [docs/architecture.md](docs/architecture.md).\n- **Won't re-download cloud files.** iCloud Drive, third-party File Provider, and\n  Photos library roots are excluded from the scan by default: reading an evicted\n  (dataless) file would fault it back down from the cloud — the opposite of\n  reclaiming space. Pass `--include-cloud` to scan them when they are fully local.\n- **Stays out of app-private and machine-managed data.** App-private stores (Mail,\n  Messages, Safari, per-app sandbox containers) and OS-managed `~/Library` trees\n  (the Spotlight index, on-device intelligence, daemon containers) are excluded by\n  default — TCC-protected and poor dedup targets; the **Trash** is excluded\n  unconditionally. Pass `--include-app-data` to include the `~/Library` set. The\n  TCC-protected user folders (Desktop/Documents/Downloads) stay in scope but are\n  reachable only with **Full Disk Access** — grant it to your terminal for an\n  interactive run; a scheduled run via `/bin/sh` cannot get it (see [Usage](#usage)).\n\n## Requirements\n\nRequires macOS (APFS), `fclones`, and `python3` (Xcode Command Line Tools —\nalready present on most dev machines). `--apply` additionally requires **macOS\n15+**: it uses `CLONE_NOFOLLOW_ANY` for symlink-safe path resolution, which\nApple's headers first define in macOS 15, and refuses to run on older systems.\nDry-run uses none of that and works on any macOS version.\n\n## Usage\n\n```\napfs-dedupe.sh [--apply] [--scope PATH] [--min SIZE] [--exclude GLOB] [--verbose]\n```\n\n- `--scope PATH` — narrow the scan if you want (default `/Users`, which covers\n  every user; deduping across users is safe — see above).\n- `--min SIZE` — ignore files smaller than this (default `1M`). A bare number is\n  **bytes** (`--min 100000` ≈ 98 KiB); add a suffix for units — `500K`, `1M`,\n  `2G` (decimal) or `KiB`/`MiB`/`GiB` (binary). The default bounds *work*, not\n  savings: cloning any ordinary allocated duplicate frees at least one 4 KiB block\n  whatever its size, so on git-/CI-heavy trees — where savings hide in many small\n  files — use a low `--min` or the `--git` preset below.\n- `--git` — preset for git-/CI-heavy machines: lowers `--min` to `1` so the many\n  small content-addressed files (git objects, build caches) where savings\n  concentrate there get deduped too. Scans and clones far more files; an explicit\n  `--min` overrides it.\n- `--exclude GLOB` — skip paths matching `GLOB`, e.g. `--exclude '*.iso'`; quote it\n  so your shell doesn't expand it first. Repeatable.\n- `--allow-root` — permit scanning `/` or the data-volume root (refused by default;\n  the tool is meant for `/Users`).\n- `--include-cloud` — also scan cloud-backed roots (iCloud Drive,\n  `~/Library/CloudStorage`, Photos libraries) that are excluded by default.\n  **Warning:** reading an evicted file re-downloads it; only safe when those roots\n  are fully downloaded locally.\n- `--include-app-data` — also scan the app-private and OS-managed `~/Library` data\n  excluded by default (Mail, Messages, Safari, per-app sandbox containers; the\n  Spotlight index, on-device intelligence and daemon-container stores) — all\n  TCC-protected and poor dedup targets. This flag does **not** grant access: the\n  TCC-protected user folders that stay in scope (Desktop, Documents, Downloads) are\n  reachable only with **Full Disk Access**, which can't be granted from a CLI.\n  Grant it to your terminal app in System Settings → Privacy \u0026 Security for an\n  interactive run. A scheduled LaunchAgent/LaunchDaemon runs via `/bin/sh`, so the\n  only thing to grant would be the system shell — Full Disk Access for *every*\n  shell script, which this tool won't recommend; the daemon therefore stays out of\n  those folders, and a periodic interactive run covers them.\n- `--verbose` — print a line per cloned file (in `--apply`) and per skipped file.\n  By default an `--apply` run prints just the summary and skips are summarized by\n  reason (see [Output](#output)); `--verbose` restores the per-file `cloned` line on\n  stdout, adds a per-file skip line on stderr, and surfaces the raw `fclones`\n  diagnostics for the folders it couldn't read. A dry-run always prints its full plan.\n\n### Output\n\nThe dry-run plan **and** the savings summary go to **stdout**; progress\n(`Scanning…`, fclones's own logs) goes to **stderr**. The summary leads with the\nfiles scanned and how long the scan took, then the reclaim; files left untouched\nare **summarized by reason** at the end, not streamed one per line — and folders an\nun-granted run couldn't read are folded into a single counted note on stderr with\nFull Disk Access advice, rather than one line each. An `--apply` run prints just\nthat summary by default — the per-file `cloned` lines are **opt-in** under\n`--verbose`, since a nightly `/Users` sweep can clone tens of thousands of files\nand one line each would bury the log. Reclaim figures lead with estimated allocated\nbytes and show logical duplicate bytes in parentheses, because sparse or compressed\nfiles can occupy fewer blocks than their logical size. So a plain redirect saves\njust the report while progress still shows on screen:\n\n```sh\n./apfs-dedupe.sh \u003e plan.txt                      # dry-run plan + summary saved; progress on screen\n./apfs-dedupe.sh --apply --verbose \u003e clones.txt   # full per-file apply record on disk\n./apfs-dedupe.sh --verbose ...                    # also list every skipped file (else summarized)\n```\n\n## Why didn't free space change? Snapshots\n\nIf `--apply` reports gigabytes of allocated space reclaimed but `df` shows little\nor no change, the space is almost certainly pinned by **APFS snapshots** — most\noften Time Machine's **local** snapshots, which macOS takes hourly. Snapshots are\ncopy-on-write: one taken while a duplicate still held its own blocks keeps\nreferencing those blocks, so the blocks dedup frees stay attached to the snapshot\ninstead of returning to free space. The allocated figure is the best filesystem\nestimate of block reclaim; the *realized* free space lags until the snapshots\nholding the pre-dedup state are gone.\n\nLocal Time Machine snapshots expire on their own (~24 hours, sooner under disk\npressure), so the space comes back by itself. To reclaim it now:\n\n```sh\n# purge up to ~20 GiB of snapshot-pinned space (bytes, then urgency 1-4); raise as needed\nsudo tmutil thinlocalsnapshots /System/Volumes/Data 21474836480 4\ndf -h /System/Volumes/Data        # confirm\n```\n\nThis removes only **local** snapshots — your actual Time Machine backups on the\nexternal/network destination are untouched; the only cost is local hourly rollback\npoints. New snapshots capture the *deduped* state, so they don't re-pin the freed\nblocks: once the old snapshots clear, the reclaim sticks. After an `--apply`, the\ntool prints this reminder when local snapshots are present (a note only — it never\ndeletes snapshots; that's your call).\n\n## Schedule a daily run\n\n`install-daily.sh` sets up a scheduled run so duplicates created since the last\nrun are reclaimed automatically.\n\n**Per-user (default)** — a **LaunchAgent** that runs as you, every day at 02:00,\nover your home directory:\n\n```sh\n./install-daily.sh                  # scope defaults to $HOME\n./install-daily.sh --scope ~/code   # or a narrower scope\n./install-daily.sh --min 1M         # override the default --git / --min 1 preset\n./install-daily.sh --print          # preview what would be installed; install nothing\n./install-daily.sh --uninstall      # remove it\n```\n\nIt runs **as you, no root**, with the same safe defaults as the CLI — cloud-backed\nroots, app-private and OS-managed `~/Library` data, and the Trash excluded, and\n`--git` (`--min 1`) so small duplicates are caught too. It runs via `/bin/sh`, so\nit cannot get Full Disk Access and does not reach Desktop/Documents/Downloads — run\nthe CLI by hand from a Full-Disk-Access terminal for those. The first run does the\nreal work; later runs are cheap, because already-cloned files are detected and\nskipped. Output is appended to `~/Library/Logs/apfs-dedupe.log`, which the daily\nrun keeps size-capped — gzipping older logs beside it — so it can't grow without\nbound (it self-rotates because a `newsyslog` rule would need root, which this\ninstall doesn't use).\n\nA LaunchAgent runs only while you're logged in, so if the Mac is asleep at 02:00\nthe run happens at the next wake.\n\n**All users (`--system`)** — a root **LaunchDaemon** that runs every day at 02:00\nover all of `/Users`, covering every account:\n\n```sh\nsudo ./install-daily.sh --system                 # scope defaults to /Users, --min 1M\nsudo ./install-daily.sh --system --min 1         # scan every non-empty file\n./install-daily.sh --system --print              # preview what would be installed; install nothing\nsudo ./install-daily.sh --system --uninstall     # remove it\n```\n\nIt runs **as root** whether or not anyone is logged in, writes\n`/Library/LaunchDaemons/com.curlewlabs.apfs-dedupe.system.plist`, and appends to\n`/Library/Logs/apfs-dedupe.log` (created `root:wheel` `0600`, because it can name\npaths under every user's home). A `newsyslog` rule\n(`/etc/newsyslog.d/com.curlewlabs.apfs-dedupe.conf`, removed by `--uninstall`) keeps\nthat log size-capped and gzipped — macOS's own rotator, with archives kept\n`root:wheel 0600` like the log itself. The default system floor is `--min 1M` for\nwhole-machine recurring runs; pass `--min 1` if you want the daemon to scan every\nnon-empty file daily.\n\nKnown limitation: `--system` stores this checkout's script path and the installer\nshell's `fclones`/`python3` search path in a root daemon. That is appropriate for a\nself-managed personal machine or trusted CI host; a future hardening pass can\nrequire root-owned, non-group/world-writable tool paths before aiming this at\nadversarial multi-user machines.\n\n## What it does not do (yet)\n\n- **Non-APFS / Linux.** APFS `clonefile` only.\n\n## Development\n\n```sh\nsh test/test.sh                              # integration tests (macOS 15+, real clonefile, fclones)\nnpx pyright@1.1.409                          # strict type check of lib/apply.py (CI's exact pin)\nshellcheck apfs-dedupe.sh install-daily.sh test/test.sh   # CI pins shellcheck 0.11.0\n```\n\n`lib/apply.py` brands path strings as `FullPath` vs `Basename` (distinct\n`NewType`s), so a directory-relative component can't be passed — or logged — where\na resolvable path belongs; `pyright` (strict) enforces it. All three checks run\nin CI on every PR.\n\n## License\n\nMIT. Duplicate detection is performed by [fclones](https://github.com/pkolaczk/fclones) (MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcurlewlabs-com%2Fapfs-dedupe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcurlewlabs-com%2Fapfs-dedupe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcurlewlabs-com%2Fapfs-dedupe/lists"}