https://github.com/leonardpepa/huffman-compression
Huffman Compression written in go
https://github.com/leonardpepa/huffman-compression
coding-challenges compression go golang huffman huffman-compression-algorithm
Last synced: about 2 months ago
JSON representation
Huffman Compression written in go
- Host: GitHub
- URL: https://github.com/leonardpepa/huffman-compression
- Owner: Leonardpepa
- Created: 2023-12-11T02:52:30.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-01-10T01:32:42.000Z (over 1 year ago)
- Last Synced: 2025-01-17T08:45:45.221Z (4 months ago)
- Topics: coding-challenges, compression, go, golang, huffman, huffman-compression-algorithm
- Language: Go
- Homepage: https://github.com/Leonardpepa/Huffman-Compression
- Size: 22.4 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Huffman Compression written in go
# Purpose
This project is a solution for [Write Your Own Compression Tool](https://codingchallenges.fyi/challenges/challenge-huffman)
build for my personal educational purposes# Description
This is an implementation of the huffman compression algorithm. Read how it works here [Huffman Coding Trees](https://opendsa-server.cs.vt.edu/ODSA/Books/CS3/html/Huffman.html)# Features
* Compress file
* Decompress fie# Implementation Details
## header - Huffman Topology
In order to decode the file we need the original tree that was used to encode it. This implementation
writes the tree structure in the file header using preorder traversal, writing 0 for internal nodes
and 1 followed by the character for each leaf node. The size of the header is stored first followed by the actual information needed to create the tree## Pseudo EOF
Because files operate in byte chunks and this algorithm writes the encoded content bit by bit
we need a way to know when the file contents end. To avoid reading any junk bits added to fill the last byte this implementation uses the technique of the Pseudo EOF. The fixed-length code for the Pseudo EOF character is stored at the end of the file.
The program stops reading when it encounters the Pseudo EOF. The Pseudo EOF is a character that doesn't occur in the original file.## Bit Stream
To achieve compression the bit strings of the fixed-length codes for each character need to be stored bit by bit in the file.
Because go doesn't provide such functionality, bitstream package was implemented from scratch to be able to read and write bits, bytes and runes bit by bit.# Usage
```terminal
Usage: ./huffmanCompression.exe [OPTION] [FILE]
[OPTIONS]:
-c FILE ~file to compress
-d FILE ~file to decompress
[FILE]:
the file specified is the file to compress or decompress,
when compressed the file will be saved with a .hf extension,
when decompressed the file will be saved as a copy of the original file
```# Example
## Compress
```terminal
.\huffmanCompression.exe -c .\tests\gutenberg.txt
2023/12/25 02:52:09 Opening file... .\tests\gutenberg.txt
2023/12/25 02:52:09 Creating huffman tree...
2023/12/25 02:52:09 Calculating frequencies...
2023/12/25 02:52:09 Create priority queue...
2023/12/25 02:52:09 Build the tree...
2023/12/25 02:52:09 Calculating variable length codes...
2023/12/25 02:52:09 Encoding the tree in the header...
2023/12/25 02:52:09 Encoding the data bit by bit...
2023/12/25 02:52:09 Encoding finished. File: gutenberg.txt.hf,
Original: (3369045 bytes) compressed: (1919959 bytes)
```
## Decompress
```terminal
.\huffmanCompression.exe -d .\gutenberg.txt.hf
2023/12/25 02:53:52 Opening file... .\gutenberg.txt.hf
2023/12/25 02:53:52 Reading encoded data...
2023/12/25 02:53:52 Constructing the huffman tree from the header...
2023/12/25 02:53:52 Decoding the data...
2023/12/25 02:53:52 Decoding finished. File: gutenberg.txt, Bytes: 3369045
```# How to run
1. Clone the repo ```git clone https://github.com/Leonardpepa/Huffman-Compression```
2. Build ```go build```
3. run on windows```huffmanCompression.exe [OPTION] [FILE]```
4. run on linux ```.\huffmanCompression [OPTION] [FILE]```
5. run all tests ```go test ./...```