https://github.com/janlelis/unibits
Visualize different Unicode encodings in the terminal
https://github.com/janlelis/unibits
ascii cli-command codepoints debugging-tool hacktoberfest ruby-cli terminal unicode utf-16 utf-32 utf-8
Last synced: 3 months ago
JSON representation
Visualize different Unicode encodings in the terminal
- Host: GitHub
- URL: https://github.com/janlelis/unibits
- Owner: janlelis
- License: mit
- Created: 2017-03-05T14:40:25.000Z (almost 9 years ago)
- Default Branch: main
- Last Pushed: 2025-09-12T20:00:52.000Z (3 months ago)
- Last Synced: 2025-09-12T22:36:57.856Z (3 months ago)
- Topics: ascii, cli-command, codepoints, debugging-tool, hacktoberfest, ruby-cli, terminal, unicode, utf-16, utf-32, utf-8
- Language: Ruby
- Homepage: https://character.construction
- Size: 1.51 MB
- Stars: 128
- Watchers: 9
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: MIT-LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# unibits | Reveal the Unicode [![[version]](https://badge.fury.io/rb/unibits.svg)](https://badge.fury.io/rb/unibits) [![[ci]](https://github.com/janlelis/unibits/workflows/Test/badge.svg)](https://github.com/janlelis/unibits/actions?query=workflow%3ATest)
Ruby library and CLI command that visualizes various Unicode and ASCII/single byte encodings in the terminal:
- Makes analyzing encodings easier
- Helps you with debugging strings
- Highlights invalid/special/blank bytes/characters/codepoints
- Supports *UTF-8*, *UTF-16LE*/*UTF-16BE*, *UTF-32LE*/*UTF-32BE*, *ISO-8859-X*, *Windows-125X*, *IBMX*, *CP85X*, *macX*, *TIS-620*/*Windows-874*, *KOI8-R*/*KOI8-U*, 7-Bit *ASCII*/*GB1988*, and arbitrary *BINARY* data
## Color Coding
Each byte of the given string is highlighted using the following mechanism (characters -> codepoints):
- Red for invalid bytes
- Light blue for blanks
- Blue for control characters
- Non-control formatting characters in pink
- Green for marks (Unicode only)
- Orange for unassigned codepoints
- Lighter orange for unassigned codepoints which are also ignorable
- Random color for all other codepoints
The same colors are used in the higher-level companion tool [uniscribe](https://github.com/janlelis/uniscribe).
## Setup
Make sure you have Ruby installed and installing gems works properly. Then do:
```
$ gem install unibits
```
## Usage
Pass the string to debug to unibits:
### From CLI
```
$ unibits "🌫 Idiosyncrätic ℜսᖯʏ"
```
### From Ruby
```ruby
require 'unibits/kernel_method'
unibits "🌫 Idiosyncrätic ℜսᖯʏ"
```
### Advanced Options
`unibits` takes some optional options:
- *encoding (e)*: The encoding of the given string (uses the string's default encoding if none given)
- *convert (c)*: An encoding the string should be converted to before visualizing it
- *stats*: Whether to show a short stats header (default: `true`), you can deactivate on the CLI with `--no-stats`
- *wide-ambiguous*: Treat characters of ambiguous width as 2 spaces instead of 1 ([more info](https://github.com/janlelis/unicode-display_width))
- *width (w)*: Set a custom column width, if not set, *unibits* will retrieve it from the terminal or just use 80
## Examples of Valid Encodings
### UTF-8
CLI: `$ unibits -e utf-8 -c utf-8 "🌫 Idiosyncrätic ℜսᖯʏ"`
Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-8'`

### UTF-16LE
CLI: `$ unibits -e utf-8 -c utf-16le "🌫 Idiosyncrätic ℜսᖯʏ"`
Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-16le'`

### UTF-32BE
CLI: `$ unibits -e utf-8 -c utf-32be "🌫 Idiosyncrätic ℜսᖯʏ"`
Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-32be'`

### BINARY
CLI: `$ unibits -e binary "🌫 Idiosyncrätic ℜսᖯʏ"`
Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'binary'`

### ASCII
CLI: `$ unibits -e utf-8 -c ascii "ascii"`
Ruby: `unibits "ascii", encoding: 'utf-8', convert: 'ascii'`

## Examples of Invalid Encodings
### UTF-8
Example in Ruby: `unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"`

### ASCII
Example in Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'ascii'`

## Notes
More info
- [Ruby's Encoding class](https://ruby-doc.org/core/Encoding.html)
- [UTF-8 (Wikipedia)](https://en.wikipedia.org/wiki/UTF-8#Description)
- [UTF-16 (Wikipedia)](https://en.wikipedia.org/wiki/UTF-16#Description)
- [UTF-32 (Wikipedia)](https://en.wikipedia.org/wiki/UTF-32)
- [Difference between BINARY and ASCII](http://idiosyncratic-ruby.com/56-us-ascii-8bit.html)
Related gems
- [uniscribe](https://github.com/janlelis/uniscribe)
- [unicopy](https://github.com/janlelis/unicopy)
- [symbolify](https://github.com/janlelis/symbolify)
- [characteristics](https://github.com/janlelis/characteristics)
Lots of thanks to @damienklinnert for the motivation and inspiration required to build this! 🎆
Copyright (C) 2017-2024 Jan Lelis . Released under the MIT license.