{"id":13439315,"url":"https://github.com/systemd/casync","last_synced_at":"2025-05-15T16:06:20.954Z","repository":{"id":15851922,"uuid":"78859695","full_name":"systemd/casync","owner":"systemd","description":"Content-Addressable Data Synchronization Tool","archived":false,"fork":false,"pushed_at":"2023-12-21T15:57:27.000Z","size":2599,"stargazers_count":1516,"open_issues_count":68,"forks_count":118,"subscribers_count":80,"default_branch":"main","last_synced_at":"2025-04-07T21:14:17.384Z","etag":null,"topics":["archive","chunking","delivery","download","file-system","http","synchronization","tar","upload"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/systemd.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS","contributing":null,"funding":null,"license":"LICENSE.LGPL2.1","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-13T15:09:56.000Z","updated_at":"2025-04-04T18:38:41.000Z","dependencies_parsed_at":"2024-12-14T22:11:19.044Z","dependency_job_id":null,"html_url":"https://github.com/systemd/casync","commit_stats":{"total_commits":638,"total_committers":47,"mean_commits":"13.574468085106384","dds":0.3479623824451411,"last_synced_commit":"e6817a79d89b48e1c6083fb1868a28f1afb32505"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemd%2Fcasync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemd%2Fcasync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemd%2Fcasync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/systemd%2Fcasync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/systemd","download_url":"https://codeload.github.com/systemd/casync/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254374470,"owners_count":22060611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","chunking","delivery","download","file-system","http","synchronization","tar","upload"],"created_at":"2024-07-31T03:01:12.879Z","updated_at":"2025-05-15T16:06:20.912Z","avatar_url":"https://github.com/systemd.png","language":"C","readme":"# casync — Content Addressable Data Synchronizer\n\nWhat is this?\n\n1. A combination of the rsync algorithm and content-addressable storage\n\n2. An efficient way to store and retrieve multiple related versions of large file systems or directory trees\n\n3. An efficient way to deliver and update OS, VM, IoT and container images over the Internet in an HTTP and CDN friendly way\n\n4. An efficient backup system\n\nSee the [Announcement Blog\nStory](http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html) for a\ncomprehensive introduction. The medium length explanation goes something like\nthis:\n\nEncoding: Let's take a large linear data stream, split it into\nvariable-sized chunks (the size of each being a function of the\nchunk's contents), and store these chunks in individual, compressed\nfiles in some directory, each file named after a strong hash value of\nits contents, so that the hash value may be used to as key for\nretrieving the full chunk data. Let's call this directory a \"chunk\nstore\". At the same time, generate a \"chunk index\" file that lists\nthese chunk hash values plus their respective chunk sizes in a simple\nlinear array. The chunking algorithm is supposed to create variable,\nbut similarly sized chunks from the data stream, and do so in a way\nthat the same data results in the same chunks even if placed at\nvarying offsets. For more information [see this blog\nstory](https://moinakg.wordpress.com/2013/06/22/high-performance-content-defined-chunking/).\n\nDecoding: Let's take the chunk index file, and reassemble the large\nlinear data stream by concatenating the uncompressed chunks retrieved\nfrom the chunk store, keyed by the listed chunk hash values.\n\nAs an extra twist, we introduce a well-defined, reproducible,\nrandom-access serialization format for directory trees (think: a more\nmodern `tar`), to permit efficient, stable storage of complete directory\ntrees in the system, simply by serializing them and then passing them\ninto the encoding step explained above.\n\nFinally, let's put all this on the network: for each image you want to\ndeliver, generate a chunk index file and place it on an HTTP\nserver. Do the same with the chunk store, and share it between the\nvarious index files you intend to deliver.\n\nWhy bother with all of this? Streams with similar contents will result\nin mostly the same chunk files in the chunk store. This means it is\nvery efficient to store many related versions of a data stream in the\nsame chunk store, thus minimizing disk usage. Moreover, when\ntransferring linear data streams chunks already known on the receiving\nside can be made use of, thus minimizing network traffic.\n\nWhy is this different from `rsync` or OSTree, or similar tools? Well,\none major difference between `casync` and those tools is that we\nremove file boundaries before chunking things up. This means that\nsmall files are lumped together with their siblings and large files\nare chopped into pieces, which permits us to recognize similarities in\nfiles and directories beyond file boundaries, and makes sure our chunk\nsizes are pretty evenly distributed, without the file boundaries\naffecting them.\n\nThe \"chunking\" algorithm is based on the buzhash rolling hash\nfunction. SHA512/256 is used as a strong hash function to generate digests of the\nchunks (alternatively: SHA256). zstd is used to compress the individual chunks\n(alternatively xz or gzip).\n\nIs this new? Conceptually, not too much. This uses well-known concepts,\nimplemented in a variety of other projects, and puts them together in a\nmoderately new, nice way. That's all. The primary influences are rsync and git,\nbut there are other systems that use similar algorithms, in particular:\n\n- BorgBackup (http://www.borgbackup.org/)\n- bup (https://bup.github.io/)\n- CAFS (https://github.com/indyjo/cafs)\n- dedupfs (https://github.com/xolox/dedupfs)\n- LBFS (https://pdos.csail.mit.edu/archive/lbfs/)\n- restic (https://restic.github.io/)\n- Tahoe-LAFS (https://tahoe-lafs.org/trac/tahoe-lafs)\n- tarsnap (https://www.tarsnap.com/)\n- Venti (https://en.wikipedia.org/wiki/Venti)\n- zsync (http://zsync.moria.org.uk/)\n\n(ordered alphabetically, not in order of relevance)\n\n## File Suffixes\n\n1. .catar → archive containing a directory tree (like \"tar\")\n2. .caidx → index file referring to a directory tree (i.e. a .catar file)\n3. .caibx → index file referring to a blob (i.e. any other file)\n4. .castr → chunk store directory (where we store chunks under their hashes)\n5. .cacnk → a compressed chunk in a chunk store (i.e. one of the files stored below a .castr directory)\n\n## Operations on directory trees\n\n```\n# casync list /home/lennart\n# casync digest /home/lennart\n# casync mtree /home/lennart (BSD mtree(5) compatible manifest)\n```\n\n## Operations on archives\n\n```\n# casync make /home/lennart.catar /home/lennart\n# casync extract /home/lennart.catar /home/lennart\n# casync list /home/lennart.catar\n# casync digest /home/lennart.catar\n# casync mtree /home/lennart.catar\n# casync mount /home/lennart.catar /home/lennart\n# casync verify /home/lennart.catar /home/lennart  (NOT IMPLEMENTED YET)\n# casync diff /home/lennart.catar /home/lennart (NOT IMPLEMENTED YET)\n```\n\n## Operations on archive index files\n\n```\n# casync make --store=/var/lib/backup.castr /home/lennart.caidx /home/lennart\n# casync extract --store=/var/lib/backup.castr /home/lennart.caidx /home/lennart\n# casync list --store=/var/lib/backup.castr /home/lennart.caidx\n# casync digest --store=/var/lib/backup.castr /home/lennart.caidx\n# casync mtree --store=/var/lib/backup.castr /home/lennart.caidx\n# casync mount --store=/var/lib/backup.castr /home/lennart.caidx /home/lennart\n# casync verify --store=/var/lib/backup.castr /home/lennart.caidx /home/lennart (NOT IMPLEMENTED YET)\n# casync diff --store=/var/lib/backup.castr /home/lennart.caidx /home/lennart (NOT IMPLEMENTED YET)\n```\n\n## Operations on blob index files\n\n```\n# casync digest --store=/var/lib/backup.castr fedora25.caibx\n# casync mkdev --store=/var/lib/backup.castr fedora25.caibx\n# casync verify --store=/var/lib/backup.castr fedora25.caibx /home/lennart/Fedora25.raw (NOT IMPLEMENTED YET)\n```\n\n## Operations involving ssh remoting\n\n```\n# casync make foobar:/srv/backup/lennart.caidx /home/lennart\n# casync extract foobar:/srv/backup/lennart.caidx /home/lennart2\n# casync list foobar:/srv/backup/lennart.caidx\n# casync digest foobar:/srv/backup/lennart.caidx\n# casync mtree foobar:/srv/backup/lennart.caidx\n# casync mount foobar:/srv/backup/lennart.caidx /home/lennart\n```\n\n## Operations involving the web\n\n```\n# casync extract http://www.foobar.com/lennart.caidx /home/lennart\n# casync list http://www.foobar.com/lennart.caidx\n# casync digest http://www.foobar.com/lennart.caidx\n# casync mtree http://www.foobar.com/lennart.caidx\n# casync extract --seed=/home/lennart http://www.foobar.com/lennart.caidx /home/lennart2\n# casync mount --seed=/home/lennart http://www.foobar.com/lennart.caidx /home/lennart2\n```\n\n## Maintenance\n\n```\n# casync gc /home/lennart-20170101.caidx /home/lennart-20170102.caidx /home/lennart-20170103.caidx\n# casync gc --backup /var/lib/backup/backup.castr /home/lennart-*.caidx\n\n# casync make /home/lennart.catab /home/lennart (NOT IMPLEMENTED)\n```\n\n## Building casync\n\ncasync uses the [Meson](http://mesonbuild.com/) build system. To build casync,\ninstall Meson (at least 0.47), as well as the necessary build dependencies\n(gcc, libzstd-dev liblzma-dev libacl1-dev libfuse-dev libudev-dev python3-sphinx). Then run:\n\n```\n# meson build \u0026\u0026 ninja -C build \u0026\u0026 sudo ninja -C build install\n```\n","funding_links":[],"categories":["C","http"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsystemd%2Fcasync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsystemd%2Fcasync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsystemd%2Fcasync/lists"}