https://github.com/jeremyctrl/whatenc
Text encoding type classifier
https://github.com/jeremyctrl/whatenc
classifier encoding text
Last synced: 5 months ago
JSON representation
Text encoding type classifier
- Host: GitHub
- URL: https://github.com/jeremyctrl/whatenc
- Owner: jeremyctrl
- License: mit
- Created: 2025-10-25T20:37:25.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-02T21:05:41.000Z (7 months ago)
- Last Synced: 2025-11-02T23:07:41.028Z (7 months ago)
- Topics: classifier, encoding, text
- Language: Python
- Homepage:
- Size: 593 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
`whatenc` is a command-line tool that identifies the encoding or transformation of a given string or file.
The model is trained on text samples from the English, Greek, Russian, Hebrew, and Arabic Wikipedia corpora, chosen to represent a diverse set of writing systems (Latin, Greek, Cyrillic, Hebrew, and Arabic scripts). Each line is encoded using multiple encoding schemes to generate labeled examples.
## How It Works
`whatenc` uses a character-level 1D Convolutional Neural Network trained directly on bigram token sequences.
Each training sample is represented as:
- bigram of characters, padded to a fixed maximum length
- a true length scalar feature, allowing the network to learn relative string lengths
This neural approach achieves near-perfect classification accuracy after only a few epochs.
### Supported Encodings
`whatenc` currently recognizes the following formats and transformations:
| Category | Encodings |
| :------- | :-------- |
| Base encodings | `base32`, `base64`, `base85`, `hex`, `url` |
| Text ciphers | `morse` |
| Compression | `gzip64` |
| Hash digests | `md5`, `sha1`, `sha224`, `sha256`, `sha384`, `sha512` |
## Installation
You can install `whatenc` using [pipx](https://pypa.github.io/pipx):
```bash
pipx install whatenc
```
## Usage
```bash
whatenc hello
whatenc samples.txt
```
### Examples
```bash
[+] input: ZW5jb2RlIHRvIGJhc2U2NCBmb3JtYXQ=
[~] top guess = base64
[=] base64 = 1.000
[=] base85 = 0.000
[=] plain = 0.000
[+] input: hello
[~] top guess = plain
[=] plain = 1.000
[=] md5 = 0.000
[=] base64 = 0.000
[*] loading model
[+] input: האקדמיה ללשון העברית
[~] top guess = plain
[=] plain = 1.000
[=] base64 = 0.000
[=] base85 = 0.000
[*] loading model
[+] input: bfa99df33b137bc8fb5f5407d7e58da8
[~] top guess = md5
[=] md5 = 0.999
[=] sha1 = 0.001
[=] sha224 = 0.000
```