https://github.com/qfcy/cxx-huffman-simple
A lightweight C++ implementation of Huffman encoding compression. 一个简单的C++霍夫曼编码压缩实现。
https://github.com/qfcy/cxx-huffman-simple
c-plus-plus compression huffman-coding huffman-compression-algorithm huffman-compressor huffman-decoder huffman-encoder
Last synced: about 1 month ago
JSON representation
A lightweight C++ implementation of Huffman encoding compression. 一个简单的C++霍夫曼编码压缩实现。
- Host: GitHub
- URL: https://github.com/qfcy/cxx-huffman-simple
- Owner: qfcy
- License: mit
- Created: 2024-12-07T12:11:41.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-12-29T07:50:35.000Z (4 months ago)
- Last Synced: 2025-02-05T10:12:24.327Z (3 months ago)
- Topics: c-plus-plus, compression, huffman-coding, huffman-compression-algorithm, huffman-compressor, huffman-decoder, huffman-encoder
- Language: C++
- Homepage:
- Size: 70.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
Awesome Lists containing this project
README
**The English introduction is placed below the Chinese version.**
一个简单的C++霍夫曼编码实现,将输入二进制文件数据压缩成`.compressed`格式,并支持解压。
#### 命令行
编译:`g++ huffman.cpp -o bin\huffman -O2 -s -Wall`。
压缩:`huffman file.txt`,程序会生成`file.txt.compressed`作为压缩的结果。
解压:`huffman file.txt.compressed`,程序会生成`file.txt`。如果`file.txt`已存在,则会提示替换。#### 文件结构与格式
文件头部分:
- 解压后数据大小`n` (8字节无符号整数)
- 霍夫曼树位图占用的字节数`m` (2字节`ushort`)
- 霍夫曼树位图,用二进制位表示 (`m`字节,0表示内部节点,1表示叶子结点,详见下文)
- 霍夫曼树每个节点代表的字符信息,越前面的字符位于树的越浅层,出现的频率越高 (`m`字节,例如:`<空格>eonsa\nritpd)(;clfug*<>+bz/,_m ...`)文件体部分:
- 霍夫曼编码的原始文件数据 (任意大小,但解压后的数据需要达到n字节,否则被视为文件截断)#### 霍夫曼树位图的存储
这个项目优化了霍夫曼树位图的存储空间,也就是遇到叶子结点之后不再往下存储。
传统的存储(a[2\*n+1]和a[2\*n+2]表示a[n]的子节点):
```
0
10
0011
```
如果霍夫曼树有十几层深,霍夫曼树的大小就会达到KB级别,这是不可接受的。
由于叶子结点1下面不可能存在新的叶子结点,这里的实现省略了多余的底部内部节点:
```
0
10
11
```
第三层的11连接到第二层的0。这里省略了两个00,优化了空间。A lightweight C++ Huffman encoding implementation that compresses input binary file data into the `.compressed` format and supports decompression.
#### Command Line
- **Compiling**: `g++ huffman.cpp -o huffman -O2 -s -Wall`.
- **Compression**: `huffman file.txt`
The program generates `file.txt.compressed` as the compressed result.
- **Decompression**: `huffman file.txt.compressed`
The program generates `file.txt`. If `file.txt` already exists, the program will prompt for replacement.#### File Structure and Format
**File Header**:
- Size of decompressed data `n` (8-byte unsigned long long)
- Size of the Huffman tree bitmap in bytes `m` (2-byte unsigned short)
- Huffman tree bitmap represented in binary bits (`m` bytes, where `0` represents an internal node and `1` represents a leaf node, see details below)
- Character information for each node in the Huffman tree. Characters appearing earlier in the list are closer to the root of the tree and have higher frequencies (`m` bytes, e.g., `eonsa\nritpd)(;clfug*<>+bz/,_m ...`)**File Body**:
- Huffman-encoded original file data (arbitrary size, but the decompressed data must match `n` bytes; otherwise, the file is considered truncated)#### Storage of the Huffman Tree Bitmap
This project optimizes the storage space for the Huffman tree bitmap by omitting unnecessary internal nodes after encountering leaf nodes.
**Traditional Storage** (where `a[2*n+1]` and `a[2*n+2]` represent the child node of `a[n]`):
```
0
10
0011
```
If the Huffman tree has a depth of more than 12 levels, its size could reach kilobytes, which is unacceptable.**Optimized Storage**:
Since no new leaf nodes can exist below a leaf node (`1`), the code omits redundant internal nodes at the bottom:
```
0
10
11
```
In this example, the third level's `11` connects to the second level's `0`. Two `00` nodes are omitted, optimizing space usage.