https://github.com/lambdacasserole/huff

Huffman compression Maven package for Java.
https://github.com/lambdacasserole/huff

compression frequency-analysis huffman-coding jitpack maven

Last synced: 3 months ago
JSON representation

Huffman compression Maven package for Java.

Host: GitHub
URL: https://github.com/lambdacasserole/huff
Owner: lambdacasserole
License: mit
Created: 2016-05-18T14:19:52.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2016-05-26T01:03:28.000Z (about 9 years ago)
Last Synced: 2025-02-12T20:19:33.973Z (4 months ago)
Topics: compression, frequency-analysis, huffman-coding, jitpack, maven
Language: Java
Homepage:
Size: 31.3 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Huff

Huffman compression Maven package for Java.

Huff is designed to provide a simple starting point for applications seeking to use Huffman coding to compress data. Given a byte array, an instance of the `HuffmanCompressor` class will:

+ Calculate byte occurence frequencies for the data set.

+ Create a Huffman tree using this set of frequencies.

+ Build prefix codes for each byte based on its relative occurence frequency.

+ Use the [backspin](https://github.com/lambdacasserole/backspin) bit manipulation package to write a compressed representation of the data to a `BitOutputStream`. 

+ Return the contents of this `BitOutputStream` as a compressed byte array, along with its length in bits (bit count) and a `PrefixCodeTable` for decompressing the data again.

The `HuffmanCompressor` class is also capable of decompressing the data using a prefix code table, a bit count and a compressed byte array.

## Installation

You can pull this package into your Maven project straight from here using JitPack. Add JitPack as a repository first:

```

    

        jitpack.io

        https://jitpack.io

    

```

Then add a dependency on Huff:

```

    

        com.github.lambdacasserole

        huff

        v1.0

    

```

## Limitations

Huff is absolutely not a package that will produce archive files out-of-the-box. It is also not an optimised-for-speed implementation. If you do use it for file compression, serializing the `PrefixCodeTable` and bit count for storage alongside your compressed data is up to you. 

That said, when you need to transparently Huffman-code a byte array (or even just do a basic frequency analysis on it), Huff is a good choice. 

## Benchmarks

Once again, this package isn't designed for straight-up file compression out of the box. If you do use it for that, however, here are some benchmarks running against the [Canterbury Corpus](http://corpus.canterbury.ac.nz/). Note that the compressed sizes do not include any space for the serialized prefix code tables/bit count needed to decompress the files again. Times are for a 1.58GHz x64 CPU with 4GB available RAM.

| File         | Uncompressed (Bytes) | Compressed (Bytes) | Ratio | Space Saving | Time (ms) |

|--------------|----------------------|--------------------|-------|--------------|-----------|

| alice29.txt  | 152089               | 87688              | 1.73  | 42%          | 321.2     |

| asyoulik.txt | 125179               | 75807              | 1.65  | 39%          | 128.8     |

| cp.html      | 24603                | 16199              | 1.52  | 34%          | 108.6     |

| fields.c     | 11150                | 7026               | 1.59  | 37%          | 45.5      |

| grammar.lsp  | 3721                 | 2170               | 1.71  | 42%          | 43.4      |

| kennedy.xls  | 1029744              | 462532             | 2.23  | 55%          | 524.2     |

| lcet10.txt   | 426754               | 250565             | 1.70  | 41%          | 287.5     |

| plrabn12.txt | 481861               | 275585             | 1.75  | 43%          | 198.6     |

| ptt5         | 513216               | 106551             | 4.82  | 79%          | 142.1     |

| sum          | 38240                | 25646              | 1.49  | 33%          | 22.5      |

| xargs.1      | 4227                 | 2602               | 1.62  | 38%          | 13.3      |

## Contributing

For most intents and purposes, Huff is considered to fulfil its original use case. Bug fixes and suggestions are welcome, however, from any member of the community.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lambdacasserole/huff

Awesome Lists containing this project

README