https://github.com/amogorkon/jackhash
Japanese, ASCII, Chinese, Korean - Hash encoding. How else would you bring a sha256 hexdigest down to 18 characters?
https://github.com/amogorkon/jackhash
cjk-characters encoding hash
Last synced: 7 months ago
JSON representation
Japanese, ASCII, Chinese, Korean - Hash encoding. How else would you bring a sha256 hexdigest down to 18 characters?
- Host: GitHub
- URL: https://github.com/amogorkon/jackhash
- Owner: amogorkon
- License: mit
- Created: 2021-10-16T17:43:12.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-05T14:33:50.000Z (over 3 years ago)
- Last Synced: 2024-10-12T09:14:28.668Z (about 1 year ago)
- Topics: cjk-characters, encoding, hash
- Language: Python
- Homepage:
- Size: 426 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# JÄCKhash - Japanese, ASCII (+non-latin), Chinese, Korean Hash encoding
Ever felt sha256 hexdigests are way too long? We have unicode everywhere now, right? Why not use the full range of characters of unicode to encode the hash?
Well, there are some potential issues to take into consideration.Many characters in ASCII have special meaning to a machine, which potentially breaks highlighting. Emojis are especially troublesome due to interoperability - many editors, chat clients etc. don't display the same emoji in the same way, some even expand the emoji to a :text:, which defeats the whole purpose of compressed encoding.
However, the chinese, japanese and korean alphabets are *huge* in comparison, yet don't come with any of these downsides - editors, chat clients etc. all seem to display these characters the same way and there is no special meaning assigned to any of these characters by machines, so they are handled properly as one big string, easy to highlight and copy&paste.I curated all those different alphabets into one (japanese would only add 100 single characters to the chinese and korean alphabets and the license would be GPL3, so I left that one out actually) - coming up with a combined alphabet of 27642 characters, which allows to boil down the length of a sha256 digest down to just 18 characters which you can freely send via chat or put in Python 3 code files since they are unicode by default.
## Licensing
Both Chinese ( https://github.com/tsroten/zhon/blob/develop/zhon/cedict/all.py ) and Korean ( https://github.com/arcsecw/wubi/blob/master/wubi/cw.py ) alphabets are licensed under MIT