https://github.com/coderobe/base65536-ruby

Unicode's answer to Base64, in Ruby
https://github.com/coderobe/base65536-ruby

Last synced: 10 months ago
JSON representation

Unicode's answer to Base64, in Ruby

Host: GitHub
URL: https://github.com/coderobe/base65536-ruby
Owner: coderobe
License: other
Created: 2016-01-31T00:39:30.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2016-02-01T18:16:35.000Z (almost 10 years ago)
Last Synced: 2024-05-01T14:30:46.018Z (over 1 year ago)
Language: Ruby
Homepage:
Size: 26.4 KB
Stars: 7
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# base65536-ruby

[Base64](https://en.wikipedia.org/wiki/Base64) is used to encode arbitrary binary data as "plain"
text using a small, extremely safe repertoire of 64 (well, 65) characters. Base64 remains highly
suited to text systems where the range of characters available is very small -- i.e., anything
still constrained to plain ASCII. Base64 encodes 6 bits, or 3/4 of an octet, per character.

However, now that Unicode rules the world, the range of characters which can be considered "safe"
in this way is, in many situations, significantly wider. Base65536 applies the same basic
principle to a carefully-chosen repertoire of 65,536 (well, 65,792) Unicode code points, encoding
16 bits, or 2 octets, per character. This allows up to 280 octets of binary data to fit in a
Tweet.

In theory, this project could have been a one-liner. In practice, naively taking each pair of
bytes and smooshing them together to make a single code point is a bad way to do this because you
end up with:

* Control characters
* Whitespace
* Unpaired surrogate pairs
* Normalization corruption
* No way to tell whether the final byte in the sequence was there in the original or not

For details of how these code points were chosen and why they are thought to be safe,
[see the project `base65536gen`](https://github.com/ferno/base65536gen).

## Known Bugs

Apparently our Ruby implementation does not work with binary, yet the original library in JavaScript does. Needs some investigation, but it seems that buffer handling is severily different between those two languages.

A temporarily fix to work with binary data is to encode them to plaintext based methods (e.g. Base64) and then to Base65536.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/coderobe/base65536-ruby

Awesome Lists containing this project

README