https://github.com/qfcy/cxx-huffman-simple

A lightweight C++ implementation of Huffman encoding compression. 一个简单的C++霍夫曼编码压缩实现。
https://github.com/qfcy/cxx-huffman-simple

c-plus-plus compression huffman-coding huffman-compression-algorithm huffman-compressor huffman-decoder huffman-encoder

Last synced: 8 months ago
JSON representation

A lightweight C++ implementation of Huffman encoding compression. 一个简单的C++霍夫曼编码压缩实现。

Host: GitHub
URL: https://github.com/qfcy/cxx-huffman-simple
Owner: qfcy
License: mit
Created: 2024-12-07T12:11:41.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-12-29T07:50:35.000Z (11 months ago)
Last Synced: 2025-02-05T10:12:24.327Z (10 months ago)
Topics: c-plus-plus, compression, huffman-coding, huffman-compression-algorithm, huffman-compressor, huffman-decoder, huffman-encoder
Language: C++
Homepage:
Size: 70.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0

Awesome Lists containing this project

README

**The English introduction is placed below the Chinese version.**

一个简单的C++霍夫曼编码实现，将输入二进制文件数据压缩成`.compressed`格式，并支持解压。

#### 命令行
编译：`g++ huffman.cpp -o bin\huffman -O2 -s -Wall`。
压缩：`huffman file.txt`，程序会生成`file.txt.compressed`作为压缩的结果。
解压：`huffman file.txt.compressed`，程序会生成`file.txt`。如果`file.txt`已存在，则会提示替换。

#### 文件结构与格式

文件头部分：
- 解压后数据大小`n` (8字节无符号整数)
- 霍夫曼树位图占用的字节数`m` (2字节`ushort`)
- 霍夫曼树位图，用二进制位表示 (`m`字节，0表示内部节点，1表示叶子结点，详见下文)
- 霍夫曼树每个节点代表的字符信息，越前面的字符位于树的越浅层，出现的频率越高（`m`字节，例如：`<空格>eonsa\nritpd)(;clfug*<>+bz/,_m ...`）

文件体部分：
- 霍夫曼编码的原始文件数据 (任意大小，但解压后的数据需要达到n字节，否则被视为文件截断)

#### 霍夫曼树位图的存储

这个项目优化了霍夫曼树位图的存储空间，也就是遇到叶子结点之后不再往下存储。
传统的存储（a[2\*n+1]和a[2\*n+2]表示a[n]的子节点）：
```
0
10
0011
```
如果霍夫曼树有十几层深，霍夫曼树的大小就会达到KB级别，这是不可接受的。
由于叶子结点1下面不可能存在新的叶子结点，这里的实现省略了多余的底部内部节点：
```
0
10
11
```
第三层的11连接到第二层的0。这里省略了两个00，优化了空间。

A lightweight C++ Huffman encoding implementation that compresses input binary file data into the `.compressed` format and supports decompression.

#### Command Line
- **Compiling**: `g++ huffman.cpp -o huffman -O2 -s -Wall`.
- **Compression**: `huffman file.txt`
The program generates `file.txt.compressed` as the compressed result.
- **Decompression**: `huffman file.txt.compressed`
The program generates `file.txt`. If `file.txt` already exists, the program will prompt for replacement.

#### File Structure and Format

**File Header**:
- Size of decompressed data `n` (8-byte unsigned long long)
- Size of the Huffman tree bitmap in bytes `m` (2-byte unsigned short)
- Huffman tree bitmap represented in binary bits (`m` bytes, where `0` represents an internal node and `1` represents a leaf node, see details below)
- Character information for each node in the Huffman tree. Characters appearing earlier in the list are closer to the root of the tree and have higher frequencies (`m` bytes, e.g., `eonsa\nritpd)(;clfug*<>+bz/,_m ...`)

**File Body**:
- Huffman-encoded original file data (arbitrary size, but the decompressed data must match `n` bytes; otherwise, the file is considered truncated)

#### Storage of the Huffman Tree Bitmap

This project optimizes the storage space for the Huffman tree bitmap by omitting unnecessary internal nodes after encountering leaf nodes.

**Traditional Storage** (where `a[2*n+1]` and `a[2*n+2]` represent the child node of `a[n]`):
```
0
10
0011
```
If the Huffman tree has a depth of more than 12 levels, its size could reach kilobytes, which is unacceptable.

**Optimized Storage**:
Since no new leaf nodes can exist below a leaf node (`1`), the code omits redundant internal nodes at the bottom:
```
0
10
11
```
In this example, the third level's `11` connects to the second level's `0`. Two `00` nodes are omitted, optimizing space usage.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/qfcy/cxx-huffman-simple

Awesome Lists containing this project

README