https://github.com/amogorkon/jackhash

Japanese, ASCII, Chinese, Korean - Hash encoding. How else would you bring a sha256 hexdigest down to 18 characters?
https://github.com/amogorkon/jackhash

cjk-characters encoding hash

Last synced: 7 months ago
JSON representation

Japanese, ASCII, Chinese, Korean - Hash encoding. How else would you bring a sha256 hexdigest down to 18 characters?

Host: GitHub
URL: https://github.com/amogorkon/jackhash
Owner: amogorkon
License: mit
Created: 2021-10-16T17:43:12.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-06-05T14:33:50.000Z (over 3 years ago)
Last Synced: 2024-10-12T09:14:28.668Z (about 1 year ago)
Topics: cjk-characters, encoding, hash
Language: Python
Homepage:
Size: 426 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# JÄCKhash - Japanese, ASCII (+non-latin), Chinese, Korean Hash encoding
Ever felt sha256 hexdigests are way too long? We have unicode everywhere now, right? Why not use the full range of characters of unicode to encode the hash?
Well, there are some potential issues to take into consideration.

Many characters in ASCII have special meaning to a machine, which potentially breaks highlighting. Emojis are especially troublesome due to interoperability - many editors, chat clients etc. don't display the same emoji in the same way, some even expand the emoji to a :text:, which defeats the whole purpose of compressed encoding.
However, the chinese, japanese and korean alphabets are *huge* in comparison, yet don't come with any of these downsides - editors, chat clients etc. all seem to display these characters the same way and there is no special meaning assigned to any of these characters by machines, so they are handled properly as one big string, easy to highlight and copy&paste.

I curated all those different alphabets into one (japanese would only add 100 single characters to the chinese and korean alphabets and the license would be GPL3, so I left that one out actually) - coming up with a combined alphabet of 27642 characters, which allows to boil down the length of a sha256 digest down to just 18 characters which you can freely send via chat or put in Python 3 code files since they are unicode by default.

## Licensing
Both Chinese ( https://github.com/tsroten/zhon/blob/develop/zhon/cedict/all.py ) and Korean ( https://github.com/arcsecw/wubi/blob/master/wubi/cw.py ) alphabets are licensed under MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amogorkon/jackhash

Awesome Lists containing this project

README