Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dito97/alphacodings
base26 and base52 encodings
https://github.com/dito97/alphacodings
encodings natural-language-processing tokenization vocabulary
Last synced: 11 days ago
JSON representation
base26 and base52 encodings
- Host: GitHub
- URL: https://github.com/dito97/alphacodings
- Owner: DiTo97
- License: mit
- Created: 2024-08-24T19:52:25.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-06T14:59:10.000Z (2 months ago)
- Last Synced: 2024-10-11T19:12:13.892Z (27 days ago)
- Topics: encodings, natural-language-processing, tokenization, vocabulary
- Language: Python
- Homepage:
- Size: 1.62 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# alphacodings
base26 ([A-Z]) and base52 ([A-Za-z]) encodings
## 🌟 overview
transform any string to alphabetic-only with base26 ([A-Z]) and base52 ([A-Za-z]) lossless encodings; useful for transmitting textual data over restrictive channels or for training AI models and tokenizers on simpler vocabularies.
**alphacodings** is a fast and lightweight C++ library; bindings are available via pybind11.
## ⚙️ installation
```python
python -m pip install alphacodings
```## 🚀 usage
```python
from alphacodings import base26_encode, base26_decode, base52_encode, base52_decodestring = """\
sample page
welcome!
you are reading a sample HTML string.
"""
if __name__ == "__main__":
encoding_base26 = base26_encode(string)
print(encoding_base26)
# >>> YBPNLKVNQWZQCMDHMLNDTVQCCRKQLNCFGMQPNGQCIXHUUPHFUNKUFEPDLKIGARFOKTDEZKQHXGCPYHDZKKVIUDNFOAYYAUOQFBJFFGSTKAXNWGDPVUJNBARPNXBASHZBXIBSSEFTAIQRPEADSOVVNXUMQXVDWTAIVCIVWQZAHAGYAVZYKGMETJOOUQNOEXMSOOGSKVMFBYZIBZDAITICYVXMJTTCCHPMSCABLYUMFDUNLVSLNKHSBPKCGASXJSFYDHZFAOEQTUACEBIFKQGYCencoding_base52 = base52_encode(string)
print(encoding_base52)
# >>> EgcgYRPxckylMQWRLDADNZxPJiJcHaVwYHLnicahBgaotGGANZuvsvcpSSOJFLXvKPjRlNQCJqqdviiIdtnwJyDOnWojsrpkWSTZFHbMIREvREjpsODtSxoLlLjQZOoehsGFzawGQecyuomgpZQNyFnZQLWPiDhzClwxBFCCwdqduGJoshrwFdwHWMtJpSTmjxzaYmNvzOIOwLkJvyQHCaFtrODPhbhBpPBmCassert base26_decode(encoding_base26) == string
assert base52_decode(encoding_base52) == string
```## 🧠 motivation
The library is inspired by [R. Heaton](https://github.com/robert)'s base26 implementation in the [pyskyWiFi](https://github.com/robert/PySkyWiFi) repository and his story on how to manipulate data transmission in restrictive network channels via alphabetic-only encodings and tokenization.
have a look at the original repository and [story blog post](https://robertheaton.com/pyskywifi) and show him some love!
## 📊 benchmarking
TBC
## 🤝 contributing
contributions to **alphacodings** are welcome!
feel free to submit pull requests or open issues on our repository.
## 📄 license
see the [LICENSE](LICENSE) file for more details.