https://github.com/ucyo/rscompress
A compression library in Rust with focus on scientific data.
https://github.com/ucyo/rscompress
approximation checksums coding compression decorrelation encoding floating-point transformation
Last synced: 6 months ago
JSON representation
A compression library in Rust with focus on scientific data.
- Host: GitHub
- URL: https://github.com/ucyo/rscompress
- Owner: ucyo
- License: mit
- Created: 2021-01-30T18:25:49.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-05-15T11:17:32.000Z (over 4 years ago)
- Last Synced: 2025-03-28T02:48:14.267Z (6 months ago)
- Topics: approximation, checksums, coding, compression, decorrelation, encoding, floating-point, transformation
- Language: Rust
- Homepage:
- Size: 20.8 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
rscompress
A compression library in Rust with focus on scientific data.
![]()
## Disclaimer
This is a rewrite and merge of several compression algorithms developed during my time as a phd student:- https://github.com/ucyo/huffman
- https://github.com/ucyo/pzip-cli
- https://github.com/ucyo/pzip-bwt
- https://github.com/ucyo/pzip-redux
- https://github.com/ucyo/pzip-huffman
- https://github.com/ucyo/pzip
- https://github.com/ucyo/rust-compress
- https://github.com/ucyo/adaptive-lossy-compression
- https://github.com/ucyo/information-spaces
- https://github.com/ucyo/cframework
- https://github.com/ucyo/xor-and-residual-calculation
- https://github.com/ucyo/climate-data-analysisThe dissertation can be downloaded from https://doi.org/10.5445/IR/1000105055
## Architecture
The library is split into one base and four supporting libraries.
The base library orchestrates the supporting libraries.
All compression algorithms follow the same basic structure:1. Decorrelate data using transformations
2. Approximate the data, if lossy compression is needed
3. Code the dataAdditionally, check if each step executed as expected.
```
+----------------+ lossless +----------+
| | | |
Start +------> | Transformation | +------------> | Coding | +------> End
| | | |
+----------------+ +----------++ ^
| |
| lossy |
| |
v |
|
+---------------+ |
| | |
| Approximation | +------------------------+
| |
+---------------+
```
This library will follow the same principles.### Transformations
Transformations are algorithms which represent the same information using a different alphabet.
Good transformation algorithms eliminate redundant information in the data.
A mathematical function can be seen as a transformation of a series of data.
The series `1 1 2 3 5 8 13 21 ..` can expressed as `f(x) = f(x-1) + f(x-2)`.
We mapped the information represented in alphabet A (integers) to an alphabet B (letters + integers) which is more compact.
It is important to note that all transformations must have two properties:- Applying a transformation algorithm to data, does not loose information.
- All transformation algorithms are reversible, such that the original representation can be reconstructed from the new alphabet.### Approximations
Approximations are algorithms which loose information for the sake of better compression.
Given a threshold `theta` (this can be absolute or relative), the algorithm maps the data from alphabet A to B with an information lose within the expected threshold.
An example for an approximation is the `~=` operator known from primary school e.g. `1/3 ~= 0.3`.
Approximations have the following properties:- Applying an approximation algorithm to data, results in information loss.
- Approximation algorithms are not reversible.
- The information loss is guaranteed to be within the threshold `theta`### Codings
Codings are algorithms where the actual compression happens.
The information is being saved on disk as compact as possible.
Examples are [Huffman](https://en.wikipedia.org/wiki/Huffman_coding) or
[Arithmetic](https://en.wikipedia.org/wiki/Arithmetic_coding) coding.### Checksums
Checksums are algorithms to check the integrity of the data at each step e.g. [Adler-32](https://en.wikipedia.org/wiki/Adler-32).