{"id":13467271,"url":"https://github.com/gblach/reflicate","last_synced_at":"2025-03-26T01:30:40.580Z","repository":{"id":65189734,"uuid":"586327812","full_name":"gblach/reflicate","owner":"gblach","description":"Deduplicate data by creating reflinks between identical files.","archived":false,"fork":false,"pushed_at":"2025-01-27T04:23:50.000Z","size":130,"stargazers_count":5,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-28T13:37:36.124Z","etag":null,"topics":["btrfs","deduplicate","deduplication","ocfs2","reflinks","rust","xfs"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gblach.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-07T18:37:42.000Z","updated_at":"2025-01-04T16:44:56.000Z","dependencies_parsed_at":"2023-10-16T19:31:33.094Z","dependency_job_id":"5830ccef-c06e-4d65-913a-7cf63934b469","html_url":"https://github.com/gblach/reflicate","commit_stats":{"total_commits":62,"total_committers":2,"mean_commits":31.0,"dds":"0.16129032258064513","last_synced_commit":"14f8089f52cd23eeb10fec37cd879ed97a9689c7"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gblach%2Freflicate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gblach%2Freflicate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gblach%2Freflicate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gblach%2Freflicate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gblach","download_url":"https://codeload.github.com/gblach/reflicate/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245571708,"owners_count":20637379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["btrfs","deduplicate","deduplication","ocfs2","reflinks","rust","xfs"],"created_at":"2024-07-31T15:00:54.635Z","updated_at":"2025-03-26T01:30:40.284Z","avatar_url":"https://github.com/gblach.png","language":"Rust","funding_links":[],"categories":["Applications"],"sub_categories":["System tools"],"readme":"# Reflicate\n\nDeduplicate data by creating reflinks between identical files.\n\n## Install\n\n```\n$ cargo install reflicate\n$ export PATH=$PATH:~/.cargo/bin\n```\n\n## Disclaimer\n\nThis is an alpha quality software.\nFeel free to test this program on your system and report bugs.\nBut remember to make a backup first.\n\n## Synopsis\n\n```\nreflicate [\u003cdirectories...\u003e] [-d] [-h] [-i \u003cindexfile\u003e] [-p] [-q]  \n  \nPositional Arguments:\n  directories       directories to deduplicate\n\nOptions:\n  -d, --dryrun      do not make any filesystem changes\n  -h, --hardlinks   make hardlinks instead of reflinks\n  -i, --indexfile   store computed hashes in indexfile and use them in subsequent runs\n  -p, --paranoid    compute sha256 hashes in addition to blake3 hashes\n                    and do not trust precomputed hashes from indexfile\n  -q, --quiet       be quiet\n```\n\n## Description\n\n**Reflicate** scans the specified directories for identical files and reflinks them together.\nFiles are considered identical when they have the same size and equal blake3 hash.\nReflinked files share the same disk blocks, so disk space is only occupied once.\nOn edit a file is copied into different blocks,\nso it's safe to reflink files that currently have the same content but may differ in the future.\n\n### Hardlinks\n\nHardlinks differ from reflinks in two ways:\n- Hardlinks are supported by virtually all posix filesystems, while reflinks are only supported by a few, eg XFS, BTRFS, OCFS2.\n- Hardlinks share the same inode, so hardlinked files are always edited together.\n\n### Indexfile\n\n**Reflicate** stores four values in the indexfile: file paths, file sizes, modification times, and blake3 hashes.\nOn subsequent runs, it computes hashes only for files that have different size or modification time.\nThis mean the program can run faster when indexfile is used.\n\nInternally indexfile is combination of CDB (constant database) and msgpack.\nThis means that indexfile will be overwritten on subsequent runs,\nso if you **reflicate** different directories, use a different indexfile.\n\n### Paranoid mode\n\nBy default **reflicate** computes and compares blake3 hashes, but in paranoid mode sha256 hashes are used as well.\nAdditionally, in paranoid mode all hashes are always computed because it is possible to manipulate file modification time.\n\n## Systemd\n\nSystemd timer can be used to run periodically.\nTo do this, you need to run the following commands:\n\n```\n$ mkdir -p ~/.config/systemd/user/\n$ cp systemd/* ~/.config/systemd/user/\n$ systemctl --user daemon-reload\n$ systemctl --user enable reflicate.timer\n```\n\nBy default, the periodic task runs weekly and **reflicate** your home directory.\nYou can adjust this to your needs by editing the `reflicate.service` and `reflicate.timer` files.\n\n## Showcase\n\nAt the beginning let's create an XFS file system, mount it, and create a test directory.\n```\n$ dd if=/dev/zero of=test.img bs=1M count=100\n$ mkfs.xfs test.img\n$ sudo mount -o loop test.img /mnt\n$ sudo mkdir /mnt/test\n$ sudo chown `id -u` /mnt/test\n```\n\nThen create two identical files and two different one.\n```\n$ dd if=/dev/zero of=/mnt/test/file1 bs=1M count=10\n$ dd if=/dev/zero of=/mnt/test/file2 bs=1M count=10\n$ dd if=/dev/zero of=/mnt/test/file3 bs=1M count=12\n$ dd if=/dev/zero of=/mnt/test/file4 bs=1M count=15\n```\n\nNow we see that 53 MiB of disk space is occupied (including metadata).\n```\n$ df -h /mnt\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/loop0       95M   53M   42M  56% /mnt\n```\n\nLet's **reflicate** the test directory.\n```\n$ reflicate /mnt/test/\n/mnt/test/file2 =\u003e /mnt/test/file1 [10 MiB]\n10 MiB saved\n```\n\nAnd we see that currently only 43 MiB of disk space is occupied.\n```\n$ df -h /mnt\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/loop0       95M   43M   52M  46% /mnt\n```\n\nLet's break the reflink and create file2 with the same content as file3.\n```\n$ dd if=/dev/zero of=/mnt/test/file2 bs=1M count=12\n\n$ df -h /mnt\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/loop0       95M   55M   40M  59% /mnt\n```\n\nThen **reflicate** the test directory again.\n```\n$ reflicate /mnt/test/\n/mnt/test/file3 =\u003e /mnt/test/file2 [12 MiB]\n12 MiB saved\n\n$ df -h /mnt\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/loop0       95M   43M   52M  46% /mnt\n```\n\nAt the end, let's remove the test filesystem.\n```\n$ sudo umount /mnt\n$ rm test.img\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgblach%2Freflicate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgblach%2Freflicate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgblach%2Freflicate/lists"}