Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dito97/alphacodings

base26 and base52 encodings
https://github.com/dito97/alphacodings

encodings natural-language-processing tokenization vocabulary

Last synced: 11 days ago
JSON representation

base26 and base52 encodings

Awesome Lists containing this project

README

        

# alphacodings

base26 ([A-Z]) and base52 ([A-Za-z]) encodings

## 🌟 overview

transform any string to alphabetic-only with base26 ([A-Z]) and base52 ([A-Za-z]) lossless encodings; useful for transmitting textual data over restrictive channels or for training AI models and tokenizers on simpler vocabularies.

**alphacodings** is a fast and lightweight C++ library; bindings are available via pybind11.

## ⚙️ installation

```python
python -m pip install alphacodings
```

## 🚀 usage

```python
from alphacodings import base26_encode, base26_decode, base52_encode, base52_decode

string = """\

sample page

welcome!


you are reading a sample HTML string.

"""

if __name__ == "__main__":
encoding_base26 = base26_encode(string)
print(encoding_base26)
# >>> YBPNLKVNQWZQCMDHMLNDTVQCCRKQLNCFGMQPNGQCIXHUUPHFUNKUFEPDLKIGARFOKTDEZKQHXGCPYHDZKKVIUDNFOAYYAUOQFBJFFGSTKAXNWGDPVUJNBARPNXBASHZBXIBSSEFTAIQRPEADSOVVNXUMQXVDWTAIVCIVWQZAHAGYAVZYKGMETJOOUQNOEXMSOOGSKVMFBYZIBZDAITICYVXMJTTCCHPMSCABLYUMFDUNLVSLNKHSBPKCGASXJSFYDHZFAOEQTUACEBIFKQGYC

encoding_base52 = base52_encode(string)
print(encoding_base52)
# >>> EgcgYRPxckylMQWRLDADNZxPJiJcHaVwYHLnicahBgaotGGANZuvsvcpSSOJFLXvKPjRlNQCJqqdviiIdtnwJyDOnWojsrpkWSTZFHbMIREvREjpsODtSxoLlLjQZOoehsGFzawGQecyuomgpZQNyFnZQLWPiDhzClwxBFCCwdqduGJoshrwFdwHWMtJpSTmjxzaYmNvzOIOwLkJvyQHCaFtrODPhbhBpPBmC

assert base26_decode(encoding_base26) == string
assert base52_decode(encoding_base52) == string
```

## 🧠 motivation

The library is inspired by [R. Heaton](https://github.com/robert)'s base26 implementation in the [pyskyWiFi](https://github.com/robert/PySkyWiFi) repository and his story on how to manipulate data transmission in restrictive network channels via alphabetic-only encodings and tokenization.

have a look at the original repository and [story blog post](https://robertheaton.com/pyskywifi) and show him some love!

## 📊 benchmarking

TBC

## 🤝 contributing

contributions to **alphacodings** are welcome!

feel free to submit pull requests or open issues on our repository.

## 📄 license

see the [LICENSE](LICENSE) file for more details.