{"id":20201512,"url":"https://github.com/niravcodes/huffman_compression","last_synced_at":"2025-04-10T11:26:56.923Z","repository":{"id":62336282,"uuid":"170150311","full_name":"niravcodes/huffman_compression","owner":"niravcodes","description":"The implementation of the Huffman algorithm as a command line utility.","archived":false,"fork":false,"pushed_at":"2020-09-23T09:11:49.000Z","size":82,"stargazers_count":11,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-24T10:12:24.061Z","etag":null,"topics":["compression","cpp","huffman","huffman-coding","huffman-compression","information-theory"],"latest_commit_sha":null,"homepage":"https://nirav.com.np","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/niravcodes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-11T15:15:03.000Z","updated_at":"2022-10-31T00:55:36.000Z","dependencies_parsed_at":"2022-10-31T02:02:04.289Z","dependency_job_id":null,"html_url":"https://github.com/niravcodes/huffman_compression","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niravcodes%2Fhuffman_compression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niravcodes%2Fhuffman_compression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niravcodes%2Fhuffman_compression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niravcodes%2Fhuffman_compression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/niravcodes","download_url":"https://codeload.github.com/niravcodes/huffman_compression/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248208666,"owners_count":21065203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","cpp","huffman","huffman-coding","huffman-compression","information-theory"],"created_at":"2024-11-14T04:51:34.813Z","updated_at":"2025-04-10T11:26:56.889Z","avatar_url":"https://github.com/niravcodes.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Huffman Compression Program\n\n[![Build Status](https://travis-ci.org/niravcodes/huffman_compression.svg?branch=master)](https://travis-ci.org/niravcodes/huffman_compression)\n\nThe implementation of the Huffman algorithm as a command line utility. Huffman coding is an entropy compression algorithm which essentially formalizes and presents an implementation for the basic idea that higher probability alphabets should be encoded with shorter codes. Huffman codes are variable length prefix code i.e. no two codes start with the same bit pattern. {10, 101, 10110} would be an invalid set of codes while {0, 10, 110, 1110} would be valid.\n\nRead detailed blog post at [**the corresponding post**](https://nirav.com.np/2019/02/14/writing-huffman-compression-in-cpp.html) on my blog.\n\n## Usage\nUsage: huff [options] input_file  \n\nOptions:  \n\n    --output output_file\n    -o output_file          : Set output file name. By default a.huff  \n\n    --decompress\n    -d                      : Decompress already compressed file  \n\n    --table\n    -t                      : Output a table with details on compression\n\n    --verbose\n    -v                      : Print progress bar and other information\n\n    --code_table\n    -c                      : When encoding, prints out huffman code table \n                                to stdout instead of in file\n                            : When decoding, accepts huffman code table from \n                                stdin and decodes a headerless binary\n\n## Examples\n\n### Compression:\n\n        ./huff filename.mkv                      (Compressed file stored as a.huff)        \n        ./huff -o compressed.huff filename.pdf   (compressed file stored as compressed.huff)\n        ./huff -v -o compressed.huff file.pdf    (Compresses with progress report)\n        ./huff -t -o comp.huff a.flv             (Outputs huffman table and other info after compression)\n        ./huff -c somefile.tar \u003e table.txt       (Compresses but doesn't prepend header to compressed file.\n                                                    Outputs header file separately to table.txt)\n\n### Decompression:\n\n        ./huff -d a.huff                         (Decodes a.huff to a.dhuff)\n        ./huff -d -c a.huff \u003c table.txt          (Decodes tableless a.huff to a.dhuff based on table.txt)\n        ./huff -v -d a.huff                      (Displays progress bar. You might to turn it on because\n                                                    it takes very long to decompress currently.)\n\n## Source Directory Structure\nThe main directory contains code for the command line interface. The important ones are:\n\n1. entrance.C       :   The entrance to the program\n2. param_parser.C   :   Code to parse the command line options. Could have used\n                        a POSIX/GNU compliant parser but didn't.\n3. help.C           :   The code for outputing help and usage deatils. Kept separate\n                        to avoid clutter\n\nThe libraries are kept in `corelib` folder. It contains the implementations \nof data structures and algorithms reuired for huffman. The libraries are:\n\n    1. huffman*         :   Implementation of Huffman's algorithm.\n    2. priority_queue   :   Naive implementation of priority queue, and\n                            of queue as a specialization of priority queue.\n    3. bitstream        :   A class that accepts arbitrarily sized bits and packs them\n                            together.\n    4. tree             :   A very specific implementation of binary tree.\n\nThe `debughelpers` folder contains code I wrote to help me debug the program. It\ncontains simple functions to present data in memory in a way I can understand.\n\nFinally the `tests` folder contains some code I wrote to unit test the implementations of\ndata structures and algorithms.\n\nYou might notice that a lot of the code in `corelib` is already implemented in the STL. \nI rewrote them anyway because I wanted to try my hand at memory management, implementing\nclasses, designing abstractions and so on. \n\n## Building\nDownload the codebase with the git clone command.\n\u003e git clone https://github.com/niravcodes/huffman_compression.git \n\nBuild with the make command.\n\n\u003e cd huffman_compression\n\n\u003e make\n\nThe program \"huff\" should appear in the main directory. Try\n\u003e ./huff \n\nIt should output usage details.\n\nTo compile test programs for the core libraries, you can \n\n    make test\n\nThe test programs should compile to `tests/` directory\n\nTo clean the source directory, type \n    \n    make clean\n\n## File format \n1. The first 12 bytes of the output file are the file signature/magic number. By default it's the character sequence \"nirav.com.np\" (because, why not? :3).\n2. Next a huffman table is placed in the format [`alphabet(1 byte)`|`huffman code length(1 byte)`|`huffman code(4 byte)`] sorted in accending order with respect to `alphabet`. Alphabets with zero frequency are not stored in the table. The table ends with the occurence `0x00` in column `alphabet`.\n3. The `0x00` is followed by 4 bytes of the decompressed file size in bits. (This should be increased to support more than 512MB of data)\n4. Finally, everything succeding the data from entry 1, 2, 3 is the compressed data. The final byte is padded with zero bits.\n\n## Caveats\n**EXTREMELY SLOW DECOMPRESSION** - I intend to address this issue soon. But for now, the decompression is very slow (because a linear search with worst case 256 items has to be performed for **every bit in the file**). I call this naive decompression. My next attempt will be to reconstruct huffman tree from the table in file header and traverse the tree for every bit. Then, I might look into other methods. I've skimmed through some papers on huffman decoding and they are pretty interesting.\n\n**Not optimised for anything** - This implementation is neither fast, nor memory efficient or better in any than other implementations. It likes to repeat to itself \"if only the best birds sang how silent the woods would be\" when it can't sleep at night.\n\n**Not ready for serious use** - Of course, given how the huffman compression utility `pack` has been deprecated for years, there is hardly any reason to use bare huffman over other compression methods (which also heavily rely on huffman but process data in different stages with predictive compression techniques).\n\n**The codebase could use some refactoring** - As it currently stands, the codebase could has some over-engineered parts and some lousily written portions. There are minor issues code duplication, some redundancies, some architectural issues, some lazily written code and some algorithms that could be subtituted for better ones without any loss in readability.\n\n## \u003cs\u003eVague plans\u003c/s\u003e\n1. \u003cs\u003eCompress folders.\u003c/s\u003e (tar before or after compression, no biggie)\n2. Try to mix with some other simple compression algorithms like RLE and compare efficiency.\n3. Try various approaches to decoding Huffman\n\n## Progess\n\u003cs\u003eReleased version 1.0.0 with basic compression and decompression. Plans to optimise, refactor and benchmark.\u003c/s\u003e All plans are cancelled. As it stands, the program will do basic huffman encoding and decoding for you. But that's about it.\n\nMore  on [nirav.com.np](https://nirav.com.np)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniravcodes%2Fhuffman_compression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fniravcodes%2Fhuffman_compression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniravcodes%2Fhuffman_compression/lists"}