{"id":16688522,"url":"https://github.com/hoijui/rezipdoc","last_synced_at":"2025-03-23T14:31:39.083Z","repository":{"id":48433957,"uuid":"177261641","full_name":"hoijui/ReZipDoc","owner":"hoijui","description":"Repack uncompressed \u0026 diff visualizer for ZIP based files stored in git repos","archived":false,"fork":false,"pushed_at":"2021-07-26T16:51:20.000Z","size":350,"stargazers_count":29,"open_issues_count":3,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-02T01:33:07.209Z","etag":null,"topics":["cli","free-software","git-filter","git-textconv","java","oseg"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hoijui.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-23T08:05:15.000Z","updated_at":"2024-12-22T19:45:42.000Z","dependencies_parsed_at":"2022-09-26T22:11:24.840Z","dependency_job_id":null,"html_url":"https://github.com/hoijui/ReZipDoc","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoijui%2FReZipDoc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoijui%2FReZipDoc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoijui%2FReZipDoc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoijui%2FReZipDoc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hoijui","download_url":"https://codeload.github.com/hoijui/ReZipDoc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244306139,"owners_count":20431747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","free-software","git-filter","git-textconv","java","oseg"],"created_at":"2024-10-12T15:44:00.811Z","updated_at":"2025-03-23T14:31:38.650Z","avatar_url":"https://github.com/hoijui.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ReZipDoc\n\nA _repack uncompressed_ \u0026 _diff visualizer_ for ZIP based files stored in git repos.\n\nMost\n[\u003cimg alt=\"git\" src=\"https://upload.wikimedia.org/wikipedia/commons/e/e0/Git-logo.svg\" height=\"20\" align=\"center\" /\u003e](\nhttps://git-scm.com/)\nrepos hosting\n[\u003cimg alt=\"Open Source Hardware\" src=\"https://upload.wikimedia.org/wikipedia/commons/f/fd/Open-source-hardware-logo.svg\" height=\"80\" align=\"center\" /\u003e](\nhttps://en.wikipedia.org/wiki/Open-source_hardware)\nshould use [__ReZipDoc__](https://github.com/hoijui/ReZipDoc).\n\n## What is this?\n\n[git](https://git-scm.com/) does not like binary files.\nThey make the repo grow fast in size in MB (see [delta compression](https://en.wikipedia.org/wiki/Delta_encoding)),\nand when you try to see what changed in a commit, you only get this:\n\n\u003e Binary files _A_ and _B_ differ!\n\n... not very useful!\n\n__ReZipDoc__ solves both of these issues, though only for ZIP based files,\nwhich includes for example FreeCAD and LibreOffice files.\n\n\u003e **NOTE** It does not work for all binary files!\n\n\u003e **HINT** If you are unsure whether a file format is ZIP based,\n\u003e just try to look at it with a software that can peak into ZIP files.\\\n\u003e  On Linux or OSX: `unzip -l someFile.xyz`\n\nSo if you are storing ZIP based files in your `git` repo,\nyou probably want to use __ReZipDoc__.\n\n## Index\n\n* [Project state](#project-state)\n* [How to use](#how-to-use)\n* [Installation](#installation)\n\t* [Install helper scripts](#install-helper-scripts)\n\t* [Install diff viewer or filter](#install-diff-viewer-or-filter)\n\t* [Install filter manually](#install-filter-manually)\n* [Filter repo history](#filter-repo-history)\n\t* [Filtering example](#filtering-example)\n* [Culprits](#culprits)\n* [Motivation](#motivation)\n* [How it works](#how-it-works)\n* [Benefits](#benefits)\n* [Observations](#observations)\n* [Based on](#based-on)\n\n## Project state\n\nThis repo contains a heavily revised, refined version of ReZip (and ZipDoc),\nplus [unit tests](src/test/java/io/github/hoijui/rezipdoc)\nand [helper scripts](scripts),\nwhich were not available in the original.\n\n[![License](https://img.shields.io/badge/license-GPL%203-orange.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)\n[![GitHub last commit](https://img.shields.io/github/last-commit/hoijui/ReZipDoc.svg)](https://github.com/hoijui/ReZipDoc)\n[![Issues](https://img.shields.io/badge/issues-GitHub-57f.svg)](https://github.com/hoijui/ReZipDoc/issues)\n\n`master`:\n[![Build Status](https://travis-ci.org/hoijui/ReZipDoc.svg?branch=master)](https://travis-ci.org/hoijui/ReZipDoc)\n[![Open Hub project report](https://www.openhub.net/p/ReZipDoc/widgets/project_thin_badge.gif)](https://www.openhub.net/p/ReZipDoc?ref=sample)\n\n[![SonarCloud Status](https://sonarcloud.io/api/project_badges/measure?project=hoijui_ReZipDoc\u0026metric=alert_status)](https://sonarcloud.io/dashboard?id=hoijui_ReZipDoc)\n[![SonarCloud Coverage](https://sonarcloud.io/api/project_badges/measure?project=hoijui_ReZipDoc\u0026metric=coverage)](https://sonarcloud.io/component_measures/metric/coverage/list?id=hoijui_ReZipDoc)\n[![SonarCloud Bugs](https://sonarcloud.io/api/project_badges/measure?project=hoijui_ReZipDoc\u0026metric=bugs)](https://sonarcloud.io/component_measures/metric/reliability_rating/list?id=hoijui_ReZipDoc)\n[![SonarCloud Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=hoijui_ReZipDoc\u0026metric=vulnerabilities)](https://sonarcloud.io/component_measures/metric/security_rating/list?id=hoijui_ReZipDoc)\n\n## How to use\n\nIf your git repo makes heavy use of ZIP based files,\nthen you probably want to use ReZipDoc in one of these three ways:\n\n* install __ZipDoc diff viewer__ -\n  This allows you to see changes within you ZIP based files\n  when looking at git history in a human-readable way.\n  It does not change your past nor future git history.\n\n  To use this, [install](#install-diff-viewer-or-filter) with `--diff` only.\n* install __ReZip filter__ -\n  This will change your future git repos history,\n  storing ZIP based files without compression.\n\n  To use this, [install](#install-diff-viewer-or-filter) with `--commit --diff --renormalize`.\n* install __ReZip filter \u0026 filter repo__ -\n  This changes both the past (\u003c- ___Caution!___)\n  and future history of your repo.\n\n  To use this, [create a copy of the repo with filtered history](#filter-repo-history).\n\n## Installation\n\nThe filter and diff tool require Java 8 or newer.\n\nThe helper scripts - which are mostly used for installing the filter -\nrequire a POSIX (~= Unix) environment.\nThis is the case on OSX, Linux, BSD, Unix and even Windows, if git is installed.\n\nThe recommended procedure is to\n[install the helper scripts](#install-helper-scripts) once,\nand then use them to comfortably install the filter into local git repos.\n\n\u003e __NOTE__\\\nThis downloads and executes an online script onto your machine,\nwhich is a potential security risk.\nYou may want to check-out the script before running it.\n\n### Install helper scripts\n\n\u003e __NOTE__\\\nThis has to be done once per developer machine.\n\nThey get installed into `~/bin/`,\nand if the directory did not exist before,\nit will get added to `PATH`.\n\nTo install:\n```bash\ncurl --silent --location \\\n  https://raw.githubusercontent.com/hoijui/ReZipDoc/master/scripts/rezipdoc-scripts-tool.sh \\\n  | sh -s install --path\n```\n\nTo update (to latest development version):\n```bash\ncurl --silent --location \\\n  https://raw.githubusercontent.com/hoijui/ReZipDoc/master/scripts/rezipdoc-scripts-tool.sh \\\n  | sh -s update --dev\n```\n\nTo remove:\n```bash\ncurl --silent --location \\\n  https://raw.githubusercontent.com/hoijui/ReZipDoc/master/scripts/rezipdoc-scripts-tool.sh \\\n  | sh -s remove\n```\n\n### Install diff viewer or filter\n\n\u003e __NOTE__\\\nThis has to be done once per repo.\n\nThis installs the latest release of ReZipDoc into your local git repo.\n\nMake sure you already have [installed the helper scripts](#install-helper-scripts)\non your machine.\n\nSwitch to the local git repo you want to install this filter to,\nfor example:\n\n```bash\ncd ~/src/myRepo/\n```\n\nAs explained in [How to use](#how-to-use),\nyou now want to use one of the following:\n\n1. Install the diff viewer\n\n\t```bash\n\trezipdoc-repo-tool.sh install --diff\n\t```\n2. Install the filter\n\n\t```bash\n\trezipdoc-repo-tool.sh install --commit --renormalize\n\t```\n3. Filter the history \u0026 install the filter\n\n\tIf you [filter the repo history](#filter-repo-history),\n\tthe freshly created, filtered repo will already have the filter installed as above.\n\nTo uninstall the diff viewer and/or filter, run:\n\n```bash\nrezipdoc-repo-tool.sh remove\n```\n\n#### Install filter manually\n\nOnly use this if you can not use [the above](#install-diff-viewer-or-filter), for some reason.\n\n1. Build the JAR\n\n\tRun this in bash:\n\n\t```bash\n\tcd\n\tmkdir -p src\n\tcd src\n\tgit clone git@github.com:hoijui/ReZipDoc.git\n\tcd ReZipDoc\n\tmvn package\n\techo \"Created ReZipDoc binary:\"\n\tls -1 $PWD/target/rezipdoc-*.jar\n\t```\n\n2. Install the JAR\n\n\tStore _rezipdoc-\\*.jar_ somewhere locally, either:\n\n\t * (global) in your home directory, for example under _~/bin/_\n\t * (repo - tracked) in your repository, tracked, for example under _\u003crepo-root\u003e/tools/_\n\t * (repo - local) __recommended__ in your repository, locally only, under _\u003crepo-root\u003e/.git/_\n\n3. Install the Filter(s)\n\n\texecute these lines:\n\n\t```bash\n\t# Install the add/commit filter\n\tgit config --replace-all filter.reZip.clean \"java -cp .git/rezipdoc-*.jar io.github.hoijui.rezipdoc.ReZip --uncompressed\"\n\n\t# (optionally) Install the checkout filter\n\tgit config --replace-all filter.reZip.smudge \"java -cp .git/rezipdoc-*.jar io.github.hoijui.rezipdoc.ReZip --compressed\"\n\n\t# (optionally) Install the diff filter\n\tgit config --replace-all diff.zipDoc.textconv \"java -cp .git/rezipdoc-*.jar io.github.hoijui.rezipdoc.ZipDoc\"\n\t```\n\n4. Enable the filters\n\n\tIn one of these files:\n\n\t* (global) _${HOME}/.gitattributes_\n\t* (repo - tracked) _\u003crepo-root\u003e/.gitattributes_\n\t* (repo - local) __recommended__ _\u003crepo-root\u003e/.git/info/attributes_\n\n\tAssign attributes to paths:\n\n\t```bash\n\t# This forces git to treat files as if they were text-based (for example in diffs)\n\t[attr]textual     diff merge text\n\t# This makes git re-zip ZIP files uncompressed on commit\n\t# NOTE See the ReZipDoc README for how to install the required git filter\n\t[attr]reZip       textual filter=reZip\n\t# This makes git visualize ZIP files as uncompressed text with some meta info\n\t# NOTE See the ReZipDoc README for how to install the required git filter\n\t[attr]zipDoc      textual diff=zipDoc\n\t# This combines in-history decompression and uncompressed view of ZIP files\n\t[attr]reZipDoc    reZip zipDoc\n\n\t# MS Office\n\t*.docx   reZipDoc\n\t*.xlsx   reZipDoc\n\t*.pptx   reZipDoc\n\t# OpenOffice\n\t*.odt    reZipDoc\n\t*.ods    reZipDoc\n\t*.odp    reZipDoc\n\t# Misc\n\t*.mcdx   reZipDoc\n\t*.slx    reZipDoc\n\t# Archives\n\t*.zip    reZipDoc\n\t# Java archives\n\t*.jar    reZipDoc\n\t# FreeCAD files\n\t*.fcstd  reZipDoc\n\t```\n\n## Filter repo history\n\nThis always creates a new copy of the repository.\n\n\u003e__NOTE__\\\nThis only filters a single branch.\n\nMake sure you have the [helper scripts installed](#install-helper-scripts) and in your `PATH`.\n\nThis filters the `master` branch of the repo at `~/src/myRepo`\ninto a new local repo `~/src/myRepo_filtered`,\nusing the original commit messages, authors and dates:\n\n```bash\nrezipdoc-history-filter.sh \\\n\t--source ~/src/myRepo \\\n\t--branch master \\\n\t--orig \\\n\t--target ~/src/myRepo_filtered\n```\n\nIt also works with an online source:\n\n```bash\nrezipdoc-history-filter.sh \\\n\t--source \"https://github.com/case06/ZACplus.git\" \\\n\t--branch master \\\n\t--orig \\\n\t--target /tmp/ZACplus_filtered\n```\n\nAfter doing this, the new, filtered repo will already have the filter installed,\nso future commits will be filtered.\n\n### Filtering example\n\nWe are going to run\n[a script that filters the Zinc-Oxide Open Hardware battery (ZAC+) project repo](\nhttps://github.com/hoijui/ReZipDoc/blob/master/scripts/rezipdoc-filter-ZACplus.sh),\nwhich has a header comment explaining what it does in detail.\n\nIn short, it downloads ReZipDoc helper scripts to `~/bin`,\nadds that dir to `PATH` if it is not there yet,\ncreates temporary git repos in `/tmp/`,\nand generates some command-line output.\n\nRun it like this:\n\n```bash\ncurl --silent --location \\\n  https://raw.githubusercontent.com/hoijui/ReZipDoc/master/scripts/rezipdoc-sample-filter-session.sh \\\n  | sh\n```\n\n## Culprits\n\nAs described in [gitattributes](http://git-scm.com/docs/gitattributes),\nyou may see unnecessary merge conflicts when you add attributes to a file that\ncauses the repository format for that file to change.\nTo prevent this, Git can be told to run a virtual check-out and check-in of all\nthree stages of a file when resolving a three-way merge:\n\n```bash\ngit config --add --bool merge.renormalize true\n```\n\n## Motivation\n\nMany popular applications, such as\n[Microsoft Office](http://en.wikipedia.org/wiki/Office_Open_XML) and\n[Libre/Open Office](http://en.wikipedia.org/wiki/OpenDocument),\nsave their documents as XML in compressed zip containers.\nSmall changes to these document's contents may result in big changes to their\ncompressed binary container file.\nWhen compressed files are stored in a Git repository\nthese big differences make delta compression inefficient or impossible\nand the repository size is roughly the sum of its revisions.\n\nThis small program acts as a Git clean filter driver.\nIt reads a ZIP file from stdin and outputs the same ZIP content to stdout,\nbut without compression.\n\n##### pros\n\n+ human readable/plain-text diffs of (ZIP based) archives,\n  (if they contain plain-text files)\n+ smaller overall repository size if the archive contents change frequently\n\n##### cons\n\n- slower `git add`/`git commit` process\n- slower checkout process, if the smudge filter is used\n\n## How it works\n\nWhen adding/committing a ZIP based file,\nReZip unpacks it and repacks it without compression,\nbefore adding it to the index/commit.\nIn an uncompressed ZIP file,\nthe archived files appear _as-is_ in its content\n(together with some binary meta-info before each file).\nIf those archived files are plain-text files,\nthis method will play nicely with git.\n\n## Benefits\n\nThe main benefit of ReZip over Zippey,\nis that the actual file stored in the repository is still a ZIP file.\nThus, in many cases, it will still work _as-is_\nwith the respective application (for example Open Office),\neven if it is obtained without going through\nthe re-packing-with-compression smudge filter,\nso for example when downloading the file through a web-interface,\ninstead of checking it out with git.\n\n## Observations\n\nThe following are based on my experience in real-world cases.\nUse at your own risk.\nYour mileage may vary.\n\n### SimuLink\n\n* One packed repository with ReZip was 54% of the size of the packed repository\n  storing compressed ZIPs.\n* Another repository with 280 _\\*.slx_ files and over 3000 commits was originally 281 MB\n  and was reduced to 156 MB using this technique (55% of baseline).\n\n### MS Power-Point\n\nI found that the loose objects stored without this filter were about 5% smaller\nthan the original file size (zLib on top of zip compression).\nWhen using the ReZip filter, the loose objects were about 10% smaller than the\noriginal files, since zLib could work more efficiently on uncompressed data.\nThe packed repository with ReZip was only 10% smaller than the packed repository\nstoring compressed zips.\nI think this unremarkable efficiency improvement is due to a large number of\n_\\*.png_ files in the presentation which were already stored without compression\nin the original _\\*.pptx_.\n\n## Based on\n\n* [__ReZip__](https://github.com/costerwi/rezip)\n  For more efficient Git packing of ZIP based files\n* [__ZipDoc__](https://github.com/costerwi/zipdoc)\n  A Git `textconv` program to show text-based diffs of ZIP files\n\n\n## Similar Projects\n\n* [__png-inflate__](https://github.com/rayrobdod/png-inflate)\n  Does the same uncompressed repack for PNG image files\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoijui%2Frezipdoc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhoijui%2Frezipdoc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoijui%2Frezipdoc/lists"}