{"id":21651663,"url":"https://github.com/hoytech/vcdiff","last_synced_at":"2025-04-11T20:30:35.504Z","repository":{"id":8433863,"uuid":"10023158","full_name":"hoytech/Vcdiff","owner":"hoytech","description":"Vcdiff - diff and patch for binary data","archived":false,"fork":false,"pushed_at":"2013-10-15T23:37:23.000Z","size":220,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T16:22:41.330Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hoytech.png","metadata":{"files":{"readme":"README.pod","changelog":"Changes","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-05-13T02:37:28.000Z","updated_at":"2022-09-21T15:51:21.000Z","dependencies_parsed_at":"2022-09-13T09:51:41.273Z","dependency_job_id":null,"html_url":"https://github.com/hoytech/Vcdiff","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoytech%2FVcdiff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoytech%2FVcdiff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoytech%2FVcdiff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hoytech%2FVcdiff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hoytech","download_url":"https://codeload.github.com/hoytech/Vcdiff/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248476078,"owners_count":21110208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T07:49:08.881Z","updated_at":"2025-04-11T20:30:35.482Z","avatar_url":"https://github.com/hoytech.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"=head1 NAME\n\nVcdiff - diff and patch for binary data\n\n=head1 SYNOPSIS\n\nB\u003cIn order to use this module you must install one or more backend modules (see below)\u003e\n\n    use Vcdiff;\n\n    my $delta = Vcdiff::diff($source, $target);\n\n    my $target2 = Vcdiff::patch($source, $delta);\n\n    ## $target2 eq $target\n\n\n\n=head1 DESCRIPTION\n\nGiven source and target data, the C\u003cVcdiff::diff\u003e function computes a \"delta\" that encodes the difference information needed to turn source into target. Anyone who has source and delta can derive target with the C\u003cVcdiff::patch\u003e function.\n\nThe point of this module is that if the source and target inputs are related then delta can be small relative to target, meaning it may be more efficient to send delta updates to clients over the network instead of re-sending the whole target every time.\n\nEven though source and target don't necessarily have to be binary data (regular data is fine too), the delta will contain binary data including NUL bytes so if your transport protocols don't support this you will have to encode or escape the delta in some way (ie base64). Compressing the delta before you do this might be worthwhile depending on the size of your changes and the entropy of your data.\n\nThe delta format is described by L\u003cRFC 3284|http://www.faqs.org/rfcs/rfc3284.html\u003e, \"The VCDIFF Generic Differencing and Compression Data Format\".\n\n\n\n\n=head1 BACKENDS\n\nL\u003cVcdiff\u003e is \"the DBI\" of VCDIFF implementations.\n\nThis module doesn't itself implement delta compression. Instead, it provides a consistent interface to various open-source VCDIFF (RFC 3284) implementations. The implementation libraries it interfaces to are called \"backends\". You must install at least one backend.\n\nThe currently supported backends are described below. See the POD documentation in the backend module distributions for more details on the pros and cons of each backend.\n\nIn order to choose which backend to use, L\u003cVcdiff\u003e will first check to see if the C\u003c$Vcdiff::backend\u003e variable is populated. If so, it will attempt to load that backend. This variable can be used to force a particular backend:\n\n    {\n        local $Vcdiff::backend = 'Vcdiff::OpenVcdiff';\n        $delta = Vcdiff::diff($source, $target);\n    }\n\nThe above will croak if L\u003cVcdiff::OpenVcdiff\u003e can't be loaded.\n\nIn the normal case, L\u003cVcdiff\u003e will check to see if any backends have been loaded already in the following order: B\u003cXdelta3, OpenVcdiff\u003e (which can be modified via the C\u003c@Vcdiff::known_backends\u003e variable):\n\n    use Vcdiff::Xdelta3;\n    $delta = Vcdiff::diff($source, $target);\n\nIf it doesn't find any loaded backends, it will try to load them in the same order.\n\nFinally, if no backends can be loaded, an exception is thrown.\n\nThe backend that will be used can be determined by calling C\u003cVcdiff::which_backend()\u003e.\n\n\n=head2 BACKEND: Xdelta3\n\nThe L\u003cVcdiff::Xdelta3\u003e backend module bundles Joshua MacDonald's L\u003cXdelta3|http://xdelta.org/\u003e library.\n\n\n=head2 BACKEND: open-vcdiff\n\nThe L\u003cVcdiff::OpenVcdiff\u003e backend module depends on L\u003cAlien::OpenVcdiff\u003e which configures, builds, and installs Google's L\u003copen-vcdiff|http://code.google.com/p/open-vcdiff/\u003e library.\n\n\n=head2 Future Backends\n\nAnother possible candidate would be Kiem-Phong Vo's L\u003cVcodex|http://www2.research.att.com/~gsf/download/ref/vcodex/vcodex.html\u003e utility which contains a VCDIFF implementation.\n\nA really cool project would be a pure-perl VCDIFF implementation that could be used in environments that are unable to compile XS modules.\n\nIn the future I plan to build a L\u003cVcdiff::DumbDiffer\u003e module (name undecided) that will completely ignore the source and create a delta that embeds the entire target. Obviously this defeats the purpose of delta compression but will allow deltas to be generated really fast. This will be useful because protocols that frequently replace the entire content won't need a special case for this.\n\n\n\n\n=head1 BACKEND-AGNOSTIC CODE\n\nUnless you are relying on features supported only by a specific backend, it's recommended that code that uses L\u003cVcdiff\u003e be backend-agnostic like this:\n\n    use Vcdiff;\n    print Vcdiff::diff(\"hello\", \"hello world\");\n\nInstead of:\n\n    use Vcdiff::Xdelta3;\n    print Vcdiff::Xdelta3::diff(\"hello\", \"hello world\");\n\nThat way the selection of which backend to use is as dynamic as possible.\n\nIf you're writing a module that depends on L\u003cVcdiff\u003e, pick a backend and add that backend's package (ie C\u003cVcdiff::Xdelta3\u003e) to your module's dependency list. This way a (sophisticated) user can force a different backend at install-time if the one you chose doesn't work for whatever reason.\n\nEven more importantly, writing backend-agnostic code allows users of your module to choose which backend to use by setting C\u003c$Vcdiff::backend\u003e before calling your module's routines. Backend-agnostic code also permits the flexibility of using one backend for diffing and another for patching by localising C\u003c$Vcdiff::backend\u003e for specific operations.\n\n\n\n\n\n=head1 STREAMING API\n\nThe streaming API is sometimes more convenient than the in-memory API. It can also be more efficient since it uses less memory and because you can start processing output before Vcdiff has finished.\n\nSometimes you have to use the streaming API in order to handle files that are too large to fit into your virtual address space (though note some backends have size limitations apart from this).\n\nIn order to send output to a stream, a file handle should be passed in as the 3rd argument to C\u003cdiff\u003e or C\u003cpatch\u003e:\n\n    Vcdiff::diff(\"hello\", \"hello world\", \\*STDOUT);\n\nIn order to fully take advantage of streaming, either or both of the source and target parameters can also be file handles instead of strings. Here is the full-streaming mode where all parameters are file handles:\n\n    open(my $source_fh, '\u003c', 'source.dat') || die $!;\n    open(my $target_fh, '\u003c', 'target.dat') || die $!;\n    open(my $delta_fh, '\u003e', 'delta.dat') || die $!;\n\n    Vcdiff::diff($source_fh, $target_fh, $delta_fh);\n\nNote that in all current backends if the source parameter is a file handle it must be backed by an C\u003clseek(2)\u003eable and/or C\u003cmmap(2)\u003eable file descriptor (in other words it must be a real file, not a pipe or socket). Vcdiff will throw an exception if the source file handle is unsuitable.\n\n\n\n\n\n=head1 MEMORY MAPPED INPUTS\n\nIf the source and/or target/delta are in files, an alternative to the streaming API is to map the files into memory with C\u003cmmap(2)\u003e and then pass the mappings in to C\u003cdiff\u003e/C\u003cpatch\u003e as strings.\n\nDoing so is more efficient than the streaming API for large files because fewer system calls are made and a kernel-space to user-space copy is avoided. However, as mentioned above, files that are too large to fit in your virtual address space must be diffed with the streaming API (this will only come up when working with multi-gigabyte files on 32 bit systems).\n\nHere is an example using L\u003cSys::Mmap\u003e (this example doesn't handle resource leaks in the case of exceptions):\n\n    use Sys::Mmap;\n\n    open(my $source_fh, '\u003c', 'source.dat') || die $!;\n    open(my $target_fh, '\u003c', 'target.dat') || die $!;\n    open(my $delta_fh, '\u003e', 'delta.dat') || die $!;\n\n    my ($source_str, $target_str);\n\n    mmap($source_str, 0, PROT_READ, MAP_SHARED, $source_fh) || die $!;\n    mmap($target_str, 0, PROT_READ, MAP_SHARED, $target_fh) || die $!;\n\n    Vcdiff::diff($source_str, $target_str, $delta_fh);\n\n    munmap($source_str);\n    munmap($target_str);\n\nNote that this is essentially what the L\u003cVcdiff::OpenVcdiff\u003e backend does for source file handles.\n\n\n\n\n=head1 TESTING\n\nThe L\u003cVcdiff\u003e distribution includes a test suite that is shared by all the backends. Backends contain stub test files that invoke L\u003cVcdiff::Test\u003e functions.\n\nEach backend also bundles backend-specific tests that relate to exception handling.\n\n=head2 $Vcdiff::Test::testcases\n\nThis is a reference to an array that contains testcases. Each testcase is an array of 3 values. The first is the source, the second the target, and the third a test description.\n\nEvery time a test-case is verified, source will be diffed with target, source will then be patched with the delta and the output compared with source.\n\nThe tests currently verify a few basic cases up to a megabyte or so in length. I'd like to go through the various backend test-suites and copy any interesting corner cases so they can be re-applied to all other backends.\n\n\n\n=head2 Vcdiff::Test::streaming()\n\nThe C\u003cVcdiff::Test::streaming()\u003e test is somewhat mis-named. It loops through all test-cases described above and for each of them it tests every streaming/in-memory API combination. You will see this in the test output like so:\n\n    ok 1 - [SSM]\n    ok 2 - [MSM]\n    ok 3 - [SMM]\n    ok 4 - [MMM]\n    ok 5 - [SSS]\n    ok 6 - [MSS]\n    ok 7 - [SMS]\n    ok 8 - [MMS]\n\nThe S/M indicators show which API combination is being used in the order of source, target/delta, and output arguments. For example, C\u003cSMS\u003e means source is streamed in from a file, the target/delta is in memory, and the output is being streamed to a file.\n\n=head2 extra-tests/cross-compat.t\n\nThe point of this test is to verify that the deltas produced by each backend are compatible will all other backends. For each combination of backend, all the C\u003cstreaming()\u003e tests above are run.\n\nSince the VCDIFF standard defines a data format, even though backends may use very different encoding algorithms their outputs should still be compatible. By default L\u003cVcdiff\u003e tries to create RFC 3284 compatible output so no backend-specific extensions like checksums or interleaving are enabled.\n\nThis test has to be run manually because it needs to have all C\u003c@Vcdiff::known_backends\u003e installed.\n\n\n\n\n=head1 SEE ALSO\n\nL\u003cVcdiff github repo|https://github.com/hoytech/Vcdiff\u003e\n\nL\u003cRFC 3284|http://www.faqs.org/rfcs/rfc3284.html\u003e, \"The VCDIFF Generic Differencing and Compression Data Format\"\n\n\n=head1 AUTHOR\n\nDoug Hoyte, C\u003c\u003c \u003cdoug@hcsw.org\u003e \u003e\u003e\n\n\n=head1 COPYRIGHT \u0026 LICENSE\n\nCopyright 2013 Doug Hoyte.\n\nThis module is licensed under the same terms as perl itself.\n\n\n=cut\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoytech%2Fvcdiff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhoytech%2Fvcdiff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoytech%2Fvcdiff/lists"}