An open API service indexing awesome lists of open source software.

https://github.com/ucyo/rscompress

A compression library in Rust with focus on scientific data.
https://github.com/ucyo/rscompress

approximation checksums coding compression decorrelation encoding floating-point transformation

Last synced: 6 months ago
JSON representation

A compression library in Rust with focus on scientific data.

Awesome Lists containing this project

README

          

rscompress


A compression library in Rust with focus on scientific data.



## Disclaimer
This is a rewrite and merge of several compression algorithms developed during my time as a phd student:

- https://github.com/ucyo/huffman
- https://github.com/ucyo/pzip-cli
- https://github.com/ucyo/pzip-bwt
- https://github.com/ucyo/pzip-redux
- https://github.com/ucyo/pzip-huffman
- https://github.com/ucyo/pzip
- https://github.com/ucyo/rust-compress
- https://github.com/ucyo/adaptive-lossy-compression
- https://github.com/ucyo/information-spaces
- https://github.com/ucyo/cframework
- https://github.com/ucyo/xor-and-residual-calculation
- https://github.com/ucyo/climate-data-analysis

The dissertation can be downloaded from https://doi.org/10.5445/IR/1000105055

## Architecture

The library is split into one base and four supporting libraries.
The base library orchestrates the supporting libraries.
All compression algorithms follow the same basic structure:

1. Decorrelate data using transformations
2. Approximate the data, if lossy compression is needed
3. Code the data

Additionally, check if each step executed as expected.

```
+----------------+ lossless +----------+
| | | |
Start +------> | Transformation | +------------> | Coding | +------> End
| | | |
+----------------+ +----------+

+ ^
| |
| lossy |
| |
v |
|
+---------------+ |
| | |
| Approximation | +------------------------+
| |
+---------------+
```
This library will follow the same principles.

### Transformations
Transformations are algorithms which represent the same information using a different alphabet.
Good transformation algorithms eliminate redundant information in the data.
A mathematical function can be seen as a transformation of a series of data.
The series `1 1 2 3 5 8 13 21 ..` can expressed as `f(x) = f(x-1) + f(x-2)`.
We mapped the information represented in alphabet A (integers) to an alphabet B (letters + integers) which is more compact.
It is important to note that all transformations must have two properties:

- Applying a transformation algorithm to data, does not loose information.
- All transformation algorithms are reversible, such that the original representation can be reconstructed from the new alphabet.

### Approximations
Approximations are algorithms which loose information for the sake of better compression.
Given a threshold `theta` (this can be absolute or relative), the algorithm maps the data from alphabet A to B with an information lose within the expected threshold.
An example for an approximation is the `~=` operator known from primary school e.g. `1/3 ~= 0.3`.
Approximations have the following properties:

- Applying an approximation algorithm to data, results in information loss.
- Approximation algorithms are not reversible.
- The information loss is guaranteed to be within the threshold `theta`

### Codings
Codings are algorithms where the actual compression happens.
The information is being saved on disk as compact as possible.
Examples are [Huffman](https://en.wikipedia.org/wiki/Huffman_coding) or
[Arithmetic](https://en.wikipedia.org/wiki/Arithmetic_coding) coding.

### Checksums
Checksums are algorithms to check the integrity of the data at each step e.g. [Adler-32](https://en.wikipedia.org/wiki/Adler-32).