{"id":27337882,"url":"https://github.com/rhizoome/git-fastcdc","last_synced_at":"2025-04-12T15:25:12.097Z","repository":{"id":229539891,"uuid":"776986651","full_name":"rhizoome/git-fastcdc","owner":"rhizoome","description":"FastCDC for large git files","archived":false,"fork":false,"pushed_at":"2024-09-03T21:35:59.000Z","size":136,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-06T02:35:16.159Z","etag":null,"topics":["active","personal"],"latest_commit_sha":null,"homepage":"https://www.rhizoome.ch/repository-status-personal/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rhizoome.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-25T00:49:56.000Z","updated_at":"2024-09-03T21:36:03.000Z","dependencies_parsed_at":"2024-03-31T13:23:16.216Z","dependency_job_id":"0a770e6b-04b2-4ba8-a068-9ec42462bc12","html_url":"https://github.com/rhizoome/git-fastcdc","commit_stats":null,"previous_names":["ganwell/git-fastcdc","rhizoome/git-fastcdc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhizoome%2Fgit-fastcdc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhizoome%2Fgit-fastcdc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhizoome%2Fgit-fastcdc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhizoome%2Fgit-fastcdc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rhizoome","download_url":"https://codeload.github.com/rhizoome/git-fastcdc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248587786,"owners_count":21129294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active","personal"],"created_at":"2025-04-12T15:25:11.373Z","updated_at":"2025-04-12T15:25:12.069Z","avatar_url":"https://github.com/rhizoome.png","language":"Python","readme":"# git-fastcdc\n\nSplit certain files using content-defined-chunking for faster deduplication. It\nhas a similar use-case to git-lfs, but blobs are in-repository. git-fastcdc\nmitigates some of the speed penalties. For most use-cases you are probably\nbetter off with git-lfs. If you have a focus on archival and deduplication, git-\nfastcdc might right for you.\n\n## Enable\n\n```bash\ngit fastcdc install\n```\n\n## Config\n\nEdit .gitattributes:\n\n```\n*.wav binary filter=git_fastcdc\n/.gitattributes text -binary -filter\n/.gitignore text -binary -filter\n```\n\nBy default git-fastcdc runs in-memory. Switch to on-disk:\n\n```bash\ngit config --local fastcdc.ondisk true\n```\n\nIf you have a pure git-fastcdc repository, you probably want to disable delta-compression \nto benefit from the speedups through fastcdc.\n\n```bash\ngit fastcdc delta disable\n```\n\nWhich will set `core.bigFileThreshold` to `200k` which isn't exect science. It\nmeans most of the history- and meta-data is delta-compressed while most of the\ncdc-blobs aren't.\n\n## Results\n\nFor my repository - 800GB of music collection:\n\n- Without git-fastcdc delta-compression took over 5 hours (actually it took all\n  night)\n- With git-fastcdc delta-compression takes about 2 minutes\n- With git-fastcdc the repostiory got slightly smaller: about 1%\n\nSo much faster repack, with the same delta-compression.\n\nMethodology: I took one state of my repostory from 2 years ago and one state\nfrom today. A lot of meta-data has changed in those two states, because I am\nconstantly fixing these using beaTunes. In both tests I created two commits\nand did `reapck -a -d -f` at the end.\n\n## How\n\nIt will split files on filtering when you add them. The split files go into\nthe `git-fastcdc` branch. You need to push this branch to remotes too!\n\nYou will see the actual data in the files in the working copy, in `*.wav` in the\nexample above. But actually the blobs of these files are just a list of chunks.\n\n## Repository Status: Personal\n\nThis repository hosts a project that is actively maintained but primarily\nintended for my personal use. It is public for transparency, sharing ideas, and\nas a resource for others who might find the methodologies or implementations\nuseful. Please consider the following:\n\n- **Status change**: Should there be significant interest in this project, I am\n  open to changing its status to accommodate broader collaboration and\n  development.\n- **Personal Project**: This is a personal project, and while it is actively\n  maintained, it is tailored to my specific needs and use cases.\n- **Limited Support**: Given the personal nature of this project, support and\n  responses to issues or pull requests might be limited. I encourage open\n  collaboration but may prioritize changes that align with my personal use.\n- **Viewing and Forking Encouraged**: You are welcome to view, fork, or use the\n  code in your own projects. However, this project is provided as-is, with no\n  guarantees of regular updates or adaptations for broader use.\n- **Contribution Guidelines**: While contributions are appreciated, they should\n  be relevant and beneficial to the project’s ongoing development. Please review\n  any provided contribution guidelines before making pull requests.\n\nFeel free to explore the code, and utilize it under the terms of the license\nattached to this repository!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhizoome%2Fgit-fastcdc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frhizoome%2Fgit-fastcdc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhizoome%2Fgit-fastcdc/lists"}