{"id":34970553,"url":"https://github.com/howerj/lzp","last_synced_at":"2026-05-19T12:39:03.633Z","repository":{"id":245329110,"uuid":"817430788","full_name":"howerj/lzp","owner":"howerj","description":"LZP Data compression CODEC","archived":false,"fork":false,"pushed_at":"2024-06-30T13:36:44.000Z","size":13,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-07-03T01:04:44.383Z","etag":null,"topics":["c","codec","compression","library","lz77","lz78","lzp","lzss","lzw-compression"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/howerj.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-19T17:32:54.000Z","updated_at":"2024-06-30T13:36:47.000Z","dependencies_parsed_at":"2024-06-21T12:47:21.902Z","dependency_job_id":"f5b48938-4129-4003-a969-1add36b29a4d","html_url":"https://github.com/howerj/lzp","commit_stats":null,"previous_names":["howerj/lzp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/howerj/lzp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howerj%2Flzp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howerj%2Flzp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howerj%2Flzp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howerj%2Flzp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/howerj","download_url":"https://codeload.github.com/howerj/lzp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howerj%2Flzp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33216897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-19T07:54:09.561Z","status":"ssl_error","status_checked_at":"2026-05-19T07:54:08.508Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","codec","compression","library","lz77","lz78","lzp","lzss","lzw-compression"],"created_at":"2025-12-26T23:44:32.722Z","updated_at":"2026-05-19T12:39:03.628Z","avatar_url":"https://github.com/howerj.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LZP Compression Code\n\n* Author: Richard James Howe\n* License: The Unlicense / Public Domain\n* Email: \u003cmailto:howe.r.j.89@gmail.com\u003e\n* Repo: \u003chttps://github.com/howerj/lzp\u003e\n\nThis repo contains an implementation of the LZP lossless compression \nroutine, this routine is incredibly simple. More complex than Run Length\nEncoding but simpler than LZSS (another simple CODEC with a better compression\nratio than LZP). The virtues of this CODEC are its simplicity and speed,\ncompression ratio is not one of them.\n\nThe library is presented is a [Head Only Library](https://en.wikipedia.org/wiki/Header-only).\n\nThe way this CODEC works is that it either outputs a literal byte or it \noutputs a byte from a model based off of previously seen characters in a\ndictionary. \n\nThe format is:\n\n* An 8 bit control character\n* 0-8 literals.\n\nWhich is repeated until the end of input.\n\nIf a bit in the control character is zero it means we need to output an encoded\nliteral, otherwise we will output a byte from the predictors model. The model\nis incredibly simple, we keep a running hash of the data. If the output of that\nrunning hash is the same as the next literal we want to output we place a 1 for \nthat byte in the control character, otherwise we have to output the literal.\n\nThis scheme limits the maximum bytes incompressible data can expand to adding\none byte for every eight (112.5%), and limits the gains to output one byte for \nevery eight bytes (12.5%).\n\nThe End Of File condition is not contained within the format (the format is not\nself terminating and relies on out of band signalling to indicate the input\nstream is finished).\n\n1. Initialize the model and running hash to zero (once only).\n2. Get 8 bytes from an input source and store in an input buffer `buf`\n3. Set the control byte to zero.\n4. For each byte `b` in `buf` if `b` is in `model[hash]` then bitwise or in\na `1` into a control byte for the bit in the control byte that represents\nthat byte in the input buffer. If it is not then or in a `0` and add that byte\n`b` to an output buffer, also add the byte `b` to the model with \n`model[hash] = b`. Update the hash with `hash = hash_function(hash, b)`.\n5. Output the control byte and then output all bytes (0-8 bytes) in the\noutput buffer.\n6. If there is more input go to step 2, otherwise terminate.\n\nTo decode:\n\n1. Initialize the model and running hash to zero (once only).\n2. Read in a single control byte.\n3. For each bit `bit` in the control byte if the bit is zero read in\nanother byte `b` and set `model[hash] = b`. Output byte `b`. If the\nbit `bit` was one then output the byte `model[hash]`. In either case the hash\nis updated with the new output byte, `c`, as in \n`hash = hash_function(hash, c)`.\n4. If there is more input go to step 2, otherwise terminate.\n\nBoth routines can be described in under thirty lines of C code (at the\ntime of writing both are 27 lines, this may change).\n\nThe hash used is often a weak one and can be experimented with. The hash\n`hash = (hash \u003c\u003c 4) ^ next_byte` is commonly used, and it mixes in new\ndata with the old. 4 bits are discarded, 4 bits are exclusively old, 4 bits\nexclusively new, and 4 bits are are mixture of both old and new bytes.\n\n## API\n\nThe library is structured as a header only library, as mentioned, it should\ncompile cleanly as C++. There are three exported functions and one structure.\n\nThe functions are `lzp_encode`, `lzp_decode` and `lzp_hash`. \n\n\ttypedef struct {\n\t\tunsigned char model[LZP_MODEL_SIZE]; /* predictor model */\n\t\tint (*get)(void *in);           /* like getchar */\n\t\tint (*put)(void *out, int ch);  /* like putchar */\n\t\tunsigned short (*hash)(unsigned short hash, unsigned char b); /* predictor */\n\t\tvoid *in, *out; /* passed to `get` and `put` respectively */\n\t\tunsigned long icnt, ocnt; /* input and output byte count respectively */\n\t} lzp_t;\n\nThe structure requires more explanation than the functions, once the structure\nhas been set up it is trivial to call `lzp_encode` or `lzp_decode`. The\nfunctions are:\n\n\tunsigned short lzp_hash(unsigned short h, unsigned char b);\n\tint lzp_encode(lzp_t *l);\n\tint lzp_decode(lzp_t *l);\n\n`lzp_hash` is the default hash function, it can be used to populate the\n`hash` field in `lzp_t`.\n\nThe function pointers in `lzp_t` called `get` and `put` are used to read and\nwrite a single character respectively, `in` and `out` (which will usually be\nFILE pointers) are passed to `get` and `put`. They are analogues of `fgetc` and\n`fputc`. Custom functions can be written to read and write to arbitrary\nlocations including memory.\n\nThere are also some macros, which can be defined by the user (they are\nsurrounded by `#ifndef` clauses).\n\n\t#define LZP_EXTERN extern /* applied to API function prototypes */\n\t#define LZP_API /* applied to all exported API functions */\n\t#define LZP_MODEL (0)\n\t#define LZP_MODEL_BITS (16)\n\nAnd a derived macro which you should not change:\n\n\t#define LZP_MODEL_SIZE (1 \u003c\u003c LZP_MODEL_BITS)\n\nIf you use an N-bit hash (up to 16-bits) then you can and should reduce the \nsize of the model by setting `LZP_MODEL_BITS`. This is done automatically for\nthe built in models, if they are used.\n\nThere are comments about LZP that hint that using a \"better\" hash function \n(one that is better in the sense that it mixes its input better) will produce\nbetter compression, from (minimal) testing this has been shown to not be the\ncase. It is often better to use a weak hash or even an identity function.\n\n## DICTIONARY PRELOAD\n\nIt is possible to preload the dictionary with a model, this may improve\nthe compression ration, especially if the workload statistics are known in\nadvance. This can be done by populating the model table, the same values\nshould be used both for compression and decompression.\n\n## BUGS AND LIMITATIONS\n\nThe input and output byte length counts are `unsigned long` values, which may\nbe 32-bit or 64-bit depending on your platform and compiler, if reading more\nthan 4GiB of data on a platform with 32-bit `long` types then this will\noverflow.\n\n## RETURN VALUE\n\nThe (example) program returns zero on success and non-zero on failure.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowerj%2Flzp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhowerj%2Flzp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowerj%2Flzp/lists"}