Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jamesparrott/sub_byte
Encodes and decodes sequences of integers with known widths, and sequences of symbols equivalent to integers under some mapping.
https://github.com/jamesparrott/sub_byte
Last synced: 2 months ago
JSON representation
Encodes and decodes sequences of integers with known widths, and sequences of symbols equivalent to integers under some mapping.
- Host: GitHub
- URL: https://github.com/jamesparrott/sub_byte
- Owner: JamesParrott
- License: mit
- Created: 2024-10-16T16:13:28.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-24T08:38:09.000Z (3 months ago)
- Last Synced: 2024-10-24T08:57:30.310Z (3 months ago)
- Language: Python
- Homepage:
- Size: 144 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Sub_Byte
Encodes and decodes sequences of integers with known widths (and sequences of symbols equivalent to integers under some mapping).
## Overview
Sub_byte efficiently stores data, while preserving its structure, without requiring compression or decompression. Simple bit packing, using less than a byte for <=7 bit fiels less, crossing byte
boundaries if necessary, utilising a known fixed bit width for each symbol (avoiding continuation bits). The bit width sequence and the
total number of symbols, must be associated with the encoded data as meta data.
Data validation (e.g. checksums or hashes) must be done by the user, but can be appended to a bit width cycle.## Implementations
### Python
Calculate a cache of data in Python.### Javascript
Decode a cache of data in Javascript, even in browser.## Alternatives
### A bespoke protocol using custom width integer types
Up to 8 u1s (bits), up to 4 u2s, or up to 2 u3s or u4s per byte.
Each developer must create their own implementation and tests.
Interoperability between different private implementations is untestable.### Protocol buffers
Encodes max symbol per byte. Variable byte encoding - uses continuation bits.
### Zipping (data compression)
- Exploits statistical distributions (e.g. "E" being more common in English text than "Q") and patterns.
- Unstructured until the end user unzips the archive.