https://github.com/kosarev/escapeless

Efficient binary encoding for large alphabets
https://github.com/kosarev/escapeless

ascii85 base122 base16 base32 base64 binary-encoding uuencoding yenc z85

Last synced: 7 months ago
JSON representation

Efficient binary encoding for large alphabets

Host: GitHub
URL: https://github.com/kosarev/escapeless
Owner: kosarev
License: mit
Created: 2019-06-01T10:43:39.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-10-17T09:01:55.000Z (over 4 years ago)
Last Synced: 2024-10-16T09:13:02.548Z (8 months ago)
Topics: ascii85, base122, base16, base32, base64, binary-encoding, uuencoding, yenc, z85
Language: C
Homepage:
Size: 27.3 KB
Stars: 25
Watchers: 3
Forks: 3
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # escapeless

Efficient binary encoding for large alphabets.

[![Build Status](https://travis-ci.org/kosarev/escapeless.svg?branch=master)](https://travis-ci.org/kosarev/escapeless)

### Features

* Low fixed-size overhead.

* Compression-friendly output.

* Arbitrary alphabets.

* Fast and simple algorithm.

* Does not involve heavy-weight arithmetic.

### Comparison chart

| Encoding         | Alphabet Size | Overhead |

| ---------------- | ------------- | -------- |

| escapeless255    | 255           |     0.4% |

| escapeless254    | 254           |     0.8% |

| escapeless253    | 253           |     1.2% |

| [yEnc](http://www.yenc.org/yenc-draft.1.3.txt)          | 252 | 1.6%*, 0-100% |

| escapeless252    | 252           |     1.6% |

| escapeless251    | 251           |     2.0% |

| escapeless250    | 250           |     2.4% |

| [B-News](http://b-news.sourceforge.net/)                | 224 | 2.5% |

| escapeless240    | 240           |     6.7% |

| escapeless230    | 230           |    11.4% |

| escapeless225    | 225           |    13.8% |

| [Base122](http://blog.kevinalbs.com/base122)            | 122 | 14.3% |

| [basE91](http://base91.sourceforge.net/)                |  91 | 22%*, 14-23% |

| [Base94](https://gist.github.com/iso2022jp/4054241)     |  94 | 22.2% |

| [Ascii85](https://en.wikipedia.org/wiki/Ascii85)        |  85 | 25.0% |

| [Z85](https://rfc.zeromq.org/spec:32/Z85/)              |  85 | 25.0% |

| [Base64](https://en.wikipedia.org/wiki/Base64)          |  64 | 33.3% |

| [uuencode](https://en.wikipedia.org/wiki/Uuencoding)    |  64 | 33.3% |

| [Base58](https://en.wikipedia.org/wiki/Base58)          |  58 | 36.6% |

| [Base36 / 64-bit](https://en.wikipedia.org/wiki/Base36) |  36 | 59.2%*, 0-62.5% |

| [Base32](https://en.wikipedia.org/wiki/Base32)          |  32 | 60.0% |

| [Base36 / 32-bit](https://en.wikipedia.org/wiki/Base36) |  36 | 62.0%*, 0-75% |

| [Base16](https://en.wikipedia.org/wiki/Base16)          |  16 | 100.0% |

(*) On uniform distribution of input octets.

### Building and testing

```shell

$ git clone [email protected]:kosarev/escapeless.git

$ cd c

$ make

$ make test

```

### Basic idea

Given a source alphabet of size S and a target alphabet of size

N < S, break the sequence of input characters into blocks so that

the number of characters in each block does not exceed N − 1.

Since a block can contain at most N − 1 different characters and

the target alphabet contains N characters, it is known that all

those used characters can be mapped to the target alphabet and at

least one extra character of the target alphabet will remain

unmapped.

For example:

```

 A B C D E F G H I J K L    12  Characters of the source alphabet (S)

 A   C D E     H I   K L     8  Characters of the target alphabet (N)

   x       x x     x         4  Characters missing in the target alphabet (takeouts)

   | | | |     | | |         7  Characters used in the block

 .         . .       . .     5  Characters not used in the block

```

Here, one possible mapping is:

```

 B −> A

 J −> K

```

with `L` left unmapped and all other characters of the target

alphabet mapped to themselves.

What that unmapped character is for, is to make it possible to

map unused takeouts, like `F` and `G` in the example, to a

character of the target alphabet that does not represent any

characters of the source alphabet for that block.

Taking that into account, here's how a complete mapping would

look:

```

 B −> A

 F -> L

 G -> L

 J −> K

```

Once the mapping is determined, we can output the encoded block

with takeout characters in it replaced with members of the target

alphabet.

To let a decoder know the mapping, we also have to prepend each

of the encoded blocks with a series of characters the takeouts

are mapped to and assume that the decoder will be given the same

set of takeout characters specified in the same order.

### Overhead formula

For a source alphabet of size S, a target alphabet of size N and

a block of N − 1 characters, the size of the encoded block is:

```

 encoded_block_size = takeouts_map_size + block_size =

                      (S − N) + (N - 1) =

                      S - 1

```

The overhead is thus:

```

 overhead = (encoded_block_size - block_size) / block_size =

            ((S - 1) - (N - 1)) / (N - 1) =

            (S - 1 - N + 1) / (N - 1) =

            (S - N) / (N - 1)

```

### Encoding algorithm

1. Break the input message into blocks so that no block contains

   more than N - 1 characters, where N is the size of the target

   alphabet.

   Process every block separately as specified below.

2. Map every takeout character to a character of the target

   alphabet that is not used in the block and is not a takeout

   character.

   All takeouts not used in the block shall map to the same

   character.

3. Replace takeout characters of the block using that map.

4. Output the map followed by the rewritten block.

### Decoding algorithm

1. Read the takeouts map and the encoded block.

3. Using the map, restore the takeouts in the block.

3. Output decoded block.

### The idea explained in greater detail

[Escapeless, Restartable, Binary Encoding](http://chilliant.blogspot.com/2020/01/escapeless-restartable-binary-encoding.html)

Thanks, Ian!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kosarev/escapeless

Awesome Lists containing this project

README