https://github.com/coderobe/base65536-ruby
Unicode's answer to Base64, in Ruby
https://github.com/coderobe/base65536-ruby
Last synced: 10 months ago
JSON representation
Unicode's answer to Base64, in Ruby
- Host: GitHub
- URL: https://github.com/coderobe/base65536-ruby
- Owner: coderobe
- License: other
- Created: 2016-01-31T00:39:30.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2016-02-01T18:16:35.000Z (almost 10 years ago)
- Last Synced: 2024-05-01T14:30:46.018Z (over 1 year ago)
- Language: Ruby
- Homepage:
- Size: 26.4 KB
- Stars: 7
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# base65536-ruby
[Base64](https://en.wikipedia.org/wiki/Base64) is used to encode arbitrary binary data as "plain"
text using a small, extremely safe repertoire of 64 (well, 65) characters. Base64 remains highly
suited to text systems where the range of characters available is very small -- i.e., anything
still constrained to plain ASCII. Base64 encodes 6 bits, or 3/4 of an octet, per character.
However, now that Unicode rules the world, the range of characters which can be considered "safe"
in this way is, in many situations, significantly wider. Base65536 applies the same basic
principle to a carefully-chosen repertoire of 65,536 (well, 65,792) Unicode code points, encoding
16 bits, or 2 octets, per character. This allows up to 280 octets of binary data to fit in a
Tweet.
In theory, this project could have been a one-liner. In practice, naively taking each pair of
bytes and smooshing them together to make a single code point is a bad way to do this because you
end up with:
* Control characters
* Whitespace
* Unpaired surrogate pairs
* Normalization corruption
* No way to tell whether the final byte in the sequence was there in the original or not
For details of how these code points were chosen and why they are thought to be safe,
[see the project `base65536gen`](https://github.com/ferno/base65536gen).
## Known Bugs
Apparently our Ruby implementation does not work with binary, yet the original library in JavaScript does. Needs some investigation, but it seems that buffer handling is severily different between those two languages.
A temporarily fix to work with binary data is to encode them to plaintext based methods (e.g. Base64) and then to Base65536.