{"id":33264866,"url":"https://github.com/WojciechMigda/zfex","last_synced_at":"2025-11-21T23:02:36.158Z","repository":{"id":48112610,"uuid":"516510158","full_name":"WojciechMigda/zfex","owner":"WojciechMigda","description":"zfex — an efficient, portable erasure coding tool","archived":false,"fork":true,"pushed_at":"2023-10-02T12:55:02.000Z","size":2944,"stargazers_count":7,"open_issues_count":6,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-06T10:32:56.707Z","etag":null,"topics":["fec","forward-error-correction","reed-solomon-codes"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"tahoe-lafs/zfec","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WojciechMigda.png","metadata":{"files":{"readme":"README.rst","changelog":"changelog","contributing":null,"funding":null,"license":"COPYING.GPL","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-21T20:17:01.000Z","updated_at":"2025-08-25T15:19:33.000Z","dependencies_parsed_at":"2023-02-10T22:16:06.647Z","dependency_job_id":null,"html_url":"https://github.com/WojciechMigda/zfex","commit_stats":null,"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/WojciechMigda/zfex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WojciechMigda%2Fzfex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WojciechMigda%2Fzfex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WojciechMigda%2Fzfex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WojciechMigda%2Fzfex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WojciechMigda","download_url":"https://codeload.github.com/WojciechMigda/zfex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WojciechMigda%2Fzfex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285704833,"owners_count":27217837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-21T02:00:06.175Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fec","forward-error-correction","reed-solomon-codes"],"created_at":"2025-11-17T06:00:33.002Z","updated_at":"2025-11-21T23:02:36.145Z","avatar_url":"https://github.com/WojciechMigda.png","language":"Python","funding_links":[],"categories":["Forward Error Correction (FEC) / Erasure Coding"],"sub_categories":["Implementations"],"readme":"\nzfex — efficient, portable erasure coding tool\n================================================\n\nGenerate redundant blocks of information such that if some of the blocks are\nlost then the original data can be recovered from the remaining blocks. This\npackage includes command-line tools, C API, Python API, and Haskell API.\n\n|build| |test-intel| |test-arm| |haskell-api| |unit-tests| |tools| |pypi|\n\n|intel-benchmark|\n\nIntro and Licence\n-----------------\n\nThis package implements an \"erasure code\", or \"forward error correction\ncode\".\n\nYou may use this package under the GNU General Public License, version 2 or,\nat your option, any later version.  You may use this package under the\nTransitive Grace Period Public Licence, version 1.0 or, at your option, any\nlater version.  (You may choose to use this package under the terms of either\nlicence, at your option.)  See the file COPYING.GPL for the terms of the GNU\nGeneral Public License, version 2.  See the file COPYING.TGPPL.rst for the\nterms of the Transitive Grace Period Public Licence, version 1.0.\n\nThe most widely known example of an erasure code is the RAID-5 algorithm\nwhich makes it so that in the event of the loss of any one hard drive, the\nstored data can be completely recovered.  The algorithm in the zfec package\nhas a similar effect, but instead of recovering from the loss of only a\nsingle element, it can be parameterized to choose in advance the number of\nelements whose loss it can tolerate.\n\nThis package is a fork of ``zfec`` library, which is largely based on\nthe old \"fec\" library by Luigi Rizzo et al.,\nwhich is a mature and optimized implementation of erasure coding.  The ``zfex``\npackage makes several changes from the original ``zfec`` package, including\nnew C-based benchmark tool and a new SIMD-friendly API.\n\n\nInstallation\n------------\n\nPython\n......\n\n``pip install zfex``\n\nTo run the self-tests, execute ``tox`` from an unpacked source tree or git checkout.\n\nTo install ``zfex`` built with custom compilation flags, execute:\n\n``CFLAGS=\"-O3\" pip install git+https://github.com/WojciechMigda/zfex.git``\n\nIf ``zfex`` is already cloned locally, then custom compiler flags can be passed to ``setup.py`` to install ``zfex`` like follows:\n\n``CFLAGS=\"-O3\" python setup.py install``\n\nIn similar manner, one can override compiler being used. Simply issue:\n\n``CC=arm-linux-gnueabihf-gcc-7 pip install git+https://github.com/WojciechMigda/zfex.git``\n\nHaskell\n.......\n\nBuilding haskell wrapper relies on ``cabal``. The most basic build command is as follows:\n\n``cabal new-build all``\n\nand it will use default C compiler settings. There are few flags available, which control building process:\n\n* ``speed`` will pass highest level optimization flag to the compiler,\n* ``ssse3`` will enable SSSE3 optimizations on Intel platform,\n* ``neon`` will enable NEON optimizations on Arm platform.\n\nExample build command which uses these flags is below:\n\n``cabal new-build all --flags \"speed ssse3\"``\n\nFor more details, including installing dependencies and running tests, please inspect haskell github actions workflow file.\n\nCommunity\n---------\n\nThe source is currently available via git on the web with the command:\n\n``git clone https://github.com/WojciechMigda/zfex``\n\nIf you find a bug in ``zfex``, please open an issue on github:\n\n\u003chttps://github.com/WojciechMigda/zfex/issues\u003e\n\nOverview\n--------\n\nThis package performs two operations, encoding and decoding.  Encoding takes\nsome input data and expands its size by producing extra \"check blocks\", also\ncalled \"secondary blocks\".  Decoding takes some data -- any combination of\nblocks of the original data (called \"primary blocks\") and \"secondary blocks\",\nand produces the original data.\n\nThe encoding is parameterized by two integers, *k* and *m*.  *m* is the total\nnumber of blocks produced, and *k* is how many of those blocks are necessary to\nreconstruct the original data.  *m* is required to be at least 1 and at most\n256, and *k* is required to be at least 1 and at most *m*.\n\n(Note that when *k* == *m* then there is no point in doing erasure coding -- it\ndegenerates to the equivalent of the Unix \"split\" utility which simply splits\nthe input into successive segments.  Similarly, when *k* == 1 it degenerates to\nthe equivalent of the unix \"cp\" utility -- each block is a complete copy of\nthe input data.)\n\nNote that each \"primary block\" is a segment of the original data, so its size\nis 1/*k*'th of the size of original data, and each \"secondary block\" is of the\nsame size, so the total space used by all the blocks is *m*/*k* times the size of\nthe original data (plus some padding to fill out the last primary block to be\nthe same size as all the others).  In addition to the data contained in the\nblocks themselves there are also a few pieces of metadata which are necessary\nfor later reconstruction.  Those pieces are: 1.  the value of *K*, 2.  the\nvalue of *M*, 3.  the sharenum of each block, 4.  the number of bytes of\npadding that were used.  The \"zfex\" command-line tool compresses these pieces\nof data and prepends them to the beginning of each share, so each the\nsharefile produced by the \"zfex\" command-line tool is between one and four\nbytes larger than the share data alone.\n\nThe decoding step requires as input *k* of the blocks which were produced by\nthe encoding step.  The decoding step produces as output the data that was\nearlier input to the encoding step.\n\n\nCommand-Line Tool\n-----------------\n\nThe bin/ directory contains two Unix-style, command-line tools ``zfex`` and\n``zunfex``.  Execute ``zfex --help`` or ``zunfex --help`` for usage\ninstructions.\n\n\nPerformance\n-----------\n\n**TODO: update with new results**\n\nTo run the benchmarks, execute the included bench/bench_zfec.py script with\noptional --k= and --m= arguments.\n\nOn my Athlon 64 2.4 GHz workstation (running Linux), the \"zfec\" command-line\ntool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9\nseconds, where the \"par2\" tool encoded the file with about 6% redundancy in\n27 seconds.  zfec encoded the same file with m=12, k=6 (100% redundancy) in\n4.1 seconds, where par2 encoded it with about 100% redundancy in 7 minutes\nand 56 seconds.\n\nThe underlying C library in benchmark mode encoded from a file at about 4.9\nmillion bytes per second and decoded at about 5.8 million bytes per second.\n\nOn Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file\nat about 6.2 million bytes per second.\n\nOn my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a\nfile at about 6.8 million bytes per second.\n\nOn my old PowerPC G4 867 MHz Mac laptop, it encoded from a file at about 1.3\nmillion bytes per second.\n\nHere is a paper analyzing the performance of various erasure codes and their\nimplementations, including zfec:\n\nhttp://www.usenix.org/events/fast09/tech/full_papers/plank/plank.pdf\n\nZfec shows good performance on different machines and with different values\nof K and M. It also has a nice small memory footprint.\n\n\nAPI\n---\n\nEach block is associated with \"blocknum\".  The blocknum of each primary block\nis its index (starting from zero), so the 0'th block is the first primary\nblock, which is the first few bytes of the file, the 1'st block is the next\nprimary block, which is the next few bytes of the file, and so on.  The last\nprimary block has blocknum *k*-1.  The blocknum of each secondary block is an\narbitrary integer between *k* and 255 inclusive.  (When using the Python API,\nif you don't specify which secondary blocks you want when invoking encode(),\nthen it will by default provide the blocks with ids from *k* to *m*-1 inclusive.)\n\n- C API\n\n  ``fec_encode()`` takes as input an array of *k* pointers, where each pointer\n  points to a memory buffer containing the input data (i.e., the *i*'th buffer\n  contains the *i*'th primary block).  There is also a second parameter which\n  is an array of the blocknums of the secondary blocks which are to be\n  produced.  (Each element in that array is required to be the blocknum of a\n  secondary block, i.e. it is required to be \u003e= *k* and \u003c *m*.)\n\n  The output from ``fec_encode()`` is the requested set of secondary blocks which\n  are written into output buffers provided by the caller.\n\n  There is another encoding API provided, ``fec_encode_simd()``, which imposes\n  additional requirements on memory blocks passed, ones which contain input blocks\n  of data and those where output block will be written. These blocks are expected\n  to be aligned to ``ZFEX_SIMD_ALIGNMENT``. ``fec_encode_simd()`` checks pointers\n  to these blocks and returns status code, which equals ``EXIT_SUCCESS`` when\n  the validation passed and encoding completed, or ``EXIT_FAILURE`` when input\n  and output requirements were not met.\n\n  Note that this ``fec_encode()`` and ``fec_encode_simd()`` are a \"low-level\" API\n  in that it requires the\n  input data to be provided in a set of memory buffers of exactly the right\n  sizes.  If you are starting instead with a single buffer containing all of\n  the data then please see easyfec.py's \"class Encoder\" as an example of how\n  to split a single large buffer into the appropriate set of input buffers\n  for ``fec_encode()``.  If you are starting with a file on disk, then please see\n  filefec.py's encode_file_stringy_easyfec() for an example of how to read\n  the data from a file and pass it to \"class Encoder\".  The Python interface\n  provides these higher-level operations, as does the Haskell interface.  If\n  you implement functions to do these higher-level tasks in other languages,\n  please send a patch so that your API can be included in future releases of zfex.\n\n  ``fec_decode()`` takes as input an array of *k* pointers, where each pointer\n  points to a buffer containing a block.  There is also a separate input\n  parameter which is an array of blocknums, indicating the blocknum of each\n  of the blocks which is being passed in.\n\n  The output from ``fec_decode()`` is the set of primary blocks which were\n  missing from the input and had to be reconstructed.  These reconstructed\n  blocks are written into output buffers provided by the caller.\n\n\n- Python API\n\n  ``encode()`` and ``decode()`` take as input a sequence of *k* buffers, where a\n  \"sequence\" is any object that implements the Python sequence protocol (such\n  as a list or tuple) and a \"buffer\" is any object that implements the Python\n  buffer protocol (such as a string or array).  The contents that are\n  required to be present in these buffers are the same as for the C API.\n\n  ``encode()`` also takes a list of desired blocknums.  Unlike the C API, the\n  Python API accepts blocknums of primary blocks as well as secondary blocks\n  in its list of desired blocknums.  ``encode()`` returns a list of buffer\n  objects which contain the blocks requested.  For each requested block which\n  is a primary block, the resulting list contains a reference to the\n  apppropriate primary block from the input list.  For each requested block\n  which is a secondary block, the list contains a newly created string object\n  containing that block.\n\n  ``decode()`` also takes a list of integers indicating the blocknums of the\n  blocks being passed int.  ``decode()`` returns a list of buffer objects which\n  contain all of the primary blocks of the original data (in order).  For\n  each primary block which was present in the input list, then the result\n  list simply contains a reference to the object that was passed in the input\n  list.  For each primary block which was not present in the input, the\n  result list contains a newly created string object containing that primary\n  block.\n\n  Beware of a \"gotcha\" that can result from the combination of mutable data\n  and the fact that the Python API returns references to inputs when\n  possible.\n\n  Returning references to its inputs is efficient since it avoids making an\n  unnecessary copy of the data, but if the object which was passed as input\n  is mutable and if that object is mutated after the call to zfex returns,\n  then the result from zfex -- which is just a reference to that same object\n  -- will also be mutated.  This subtlety is the price you pay for avoiding\n  data copying.  If you don't want to have to worry about this then you can\n  simply use immutable objects (e.g. Python strings) to hold the data that\n  you pass to ``zfex``.\n\n  Currently, ``fec_encode_simd()`` C API does not have a python wrapper.\n\n- Haskell API\n\n  The Haskell code is fully Haddocked, to generate the documentation, run\n  ``runhaskell Setup.lhs haddock``.\n\n\nUtilities\n---------\n\nThe ``filefec.py`` module has a utility function for efficiently reading a file\nand encoding it piece by piece.  This module is used by the \"zfex\" and\n\"zunfex\" command-line tools from the bin/ directory.\n\n\nDependencies\n------------\n\nA C compiler is required.  To use the Python API or the command-line tools a\nPython interpreter is also required.  We have tested it with Python v2.7,\nv3.5 and v3.6.  For the Haskell interface, GHC \u003e= 6.8.1 is required.\n\n\nAcknowledgements\n----------------\n\nThanks to the author of the original fec lib, Luigi Rizzo, and the folks that\ncontributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and\nDan Rubenstein.  Thanks to the Mnet hackers who wrote an earlier Python\nwrapper, especially Myers Carpenter and Hauke Johannknecht.  Thanks to Brian\nWarner and Amber O'Whielacronx for help with the API, documentation,\ndebugging, compression, and unit tests.  Thanks to Adam Langley for improving\nthe C API and contributing the Haskell API.  Thanks to the creators of GCC\n(starting with Richard M. Stallman) and Valgrind (starting with Julian\nSeward) for a pair of excellent tools.  Thanks to employees at Allmydata\n-- http://allmydata.com -- Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian\nWarner, Zandr Milewski, Justin Boreta, Mark Meras for sponsoring part of this work (original ``zfec``)\nand releasing it under a Free Software licence. Thanks to Jack Lloyd, Samuel\nNeves, and David-Sarah Hopwood.\nLast, but not least, thanks to the authors of original ``zfec`` library, from which\nthis one forked from.\nThanks to Gabs Ricalde, for contributing ARM SIMD-optimized code to ``zfec``, which then\ninspired Intel SIMD-optimizations introduced here.\n\n\nRelated Works\n-------------\n\nNote: a Unix-style tool like \"zfex\" does only one thing -- in this case\nerasure coding -- and leaves other tasks to other tools.  Other Unix-style\ntools that go well with zfex include `GNU tar`_ for archiving multiple files\nand directories into one file, `lzip`_ for compression, and `GNU Privacy\nGuard`_ for encryption or `b2sum`_ for integrity.  It is important to do\nthings in order: first archive, then compress, then either encrypt or\nintegrity-check, then erasure code.  Note that if GNU Privacy Guard is used\nfor privacy, then it will also ensure integrity, so the use of b2sum is\nunnecessary in that case. Note also that you also need to do integrity\nchecking (such as with b2sum) on the blocks that result from the erasure\ncoding in *addition* to doing it on the file contents! (There are two\ndifferent subtle failure modes -- see \"more than one file can match an\nimmutable file cap\" on the `Hack Tahoe-LAFS!`_ Hall of Fame.)\n\n`fecpp`_ is an alternative to zfex. It implements a bitwise-compatible\nalgorithm to zfex and is BSD-licensed.\n\n.. _GNU tar: http://directory.fsf.org/project/tar/\n.. _lzip: http://www.nongnu.org/lzip/lzip.html\n.. _GNU Privacy Guard: http://gnupg.org/\n.. _b2sum: https://blake2.net/\n.. _Hack Tahoe-LAFS!: https://tahoe-lafs.org/hacktahoelafs/\n.. _fecpp: http://www.randombit.net/code/fecpp/\n\n\nEnjoy!\n\n\n----\n\n.. |pypi| image:: http://img.shields.io/pypi/v/zfex.svg\n   :alt: PyPI release status\n   :target: https://pypi.python.org/pypi/zfex\n\n.. |build| image:: https://github.com/WojciechMigda/zfex/actions/workflows/build.yml/badge.svg\n   :alt: Package Build\n   :target: https://github.com/WojciechMigda/zfex/actions/workflows/build.yml\n\n.. |test-intel| image:: https://github.com/WojciechMigda/zfex/actions/workflows/test.yml/badge.svg\n   :alt: Tests on Intel hardware\n   :target: https://github.com/WojciechMigda/zfex/actions/workflows/test.yml\n\n.. |test-arm| image:: https://github.com/WojciechMigda/zfex/actions/workflows/test-qemu.yml/badge.svg\n   :alt: Tests on ARM qemu-emulated environment\n   :target: https://github.com/WojciechMigda/zfex/actions/workflows/test-qemu.yml\n\n.. |haskell-api| image:: https://github.com/WojciechMigda/zfex/actions/workflows/haskell-api.yml/badge.svg\n   :alt: Haskell API\n   :target: https://github.com/WojciechMigda/zfex/actions/workflows/haskell-api.yml\n\n.. |tools| image:: https://github.com/WojciechMigda/zfex/actions/workflows/tools.yml/badge.svg\n   :alt: Tools\n   :target: https://github.com/WojciechMigda/zfex/actions/workflows/tools.yml\n\n.. |intel-benchmark| image:: bench/images/bench_intel_k7_m10_1M.png\n   :alt: Intel benchmark chart\n   :target: bench/Results.rst\n\n.. |unit-tests| image:: https://github.com/WojciechMigda/zfex/actions/workflows/utests.yml/badge.svg\n   :alt: Unit tests\n   :target: https://github.com/WojciechMigda/zfex/actions/workflows/utests.yml\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWojciechMigda%2Fzfex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FWojciechMigda%2Fzfex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWojciechMigda%2Fzfex/lists"}