https://github.com/owez/rle
A simple run-length-encoding implementation for light compression
https://github.com/owez/rle
compression compression-algorithm rle run-length-encoding rust
Last synced: 8 months ago
JSON representation
A simple run-length-encoding implementation for light compression
- Host: GitHub
- URL: https://github.com/owez/rle
- Owner: Owez
- License: mit
- Created: 2020-11-30T19:54:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-11-30T20:16:47.000Z (over 5 years ago)
- Last Synced: 2025-01-15T06:13:01.352Z (over 1 year ago)
- Topics: compression, compression-algorithm, rle, run-length-encoding, rust
- Language: Rust
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Run-length-encoding: `rle`
A simple [run-length-encoding](https://en.wikipedia.org/wiki/Run-length_encoding) implementation for light compression
## Examples
Encoding:
```rust
use rle;
fn main() {
let data = &[44, 43, 6, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 3];
let compressed = rle::compress(data);
println!("normal: {}, compressed: {}", data.len(), compressed.len());
// will show "normal: 16, compressed: 10"
}
```
## Under the hood
This `rle` library uses a custom [run-length-encoding](https://en.wikipedia.org/wiki/Run-length_encoding) technique in order to get the most efficiant compression results. It achieves this by allowing all repeating characters under 6x through, e.g:
```none
helllllo!
```
But allows anything 6x or above through, e.g:
```none
hellllllllllllllllllllllllllllllllllllllllllllo!
```
This is due to the encoding using a `u32` under the hood to store the length, which means the it can store up to ~4 billion repeating characters until overflow. A run-length-encoded block would look like the following for the previous example:
```none
[h, e, 4, 0, 0, 0, 44, o, !]
```
You may assume whatever binary encoding you'd like for these letters to properly expand this block, but in essense it uses an [End-of-Transmission character](https://en.wikipedia.org/wiki/End-of-Transmission_character) to represent the start of an run-length-encoded block and has a `[u8; 4]` (which represents the previously mentioned `u32` in big-endian form).
After that, it simply has a `u8` for the byte it is representing and continued further onwards; looping this compression/decompression until the end of the inputted bytes.