https://github.com/janlelis/unicode-confusable
Unicode::Confusable.confusable? "ℜսᖯʏ", "Ruby"
https://github.com/janlelis/unicode-confusable
confusables ruby script-confusable unicode unicode-data
Last synced: 4 months ago
JSON representation
Unicode::Confusable.confusable? "ℜսᖯʏ", "Ruby"
- Host: GitHub
- URL: https://github.com/janlelis/unicode-confusable
- Owner: janlelis
- License: mit
- Created: 2016-03-12T14:48:21.000Z (almost 10 years ago)
- Default Branch: main
- Last Pushed: 2025-09-09T14:46:00.000Z (5 months ago)
- Last Synced: 2025-10-10T00:36:03.181Z (4 months ago)
- Topics: confusables, ruby, script-confusable, unicode, unicode-data
- Language: Ruby
- Homepage:
- Size: 295 KB
- Stars: 75
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: MIT-LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Unicode::Confusable [![[version]](https://badge.fury.io/rb/unicode-confusable.svg)](https://badge.fury.io/rb/unicode-confusable) [![[ci]](https://github.com/janlelis/unicode-confusable/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-confusable/actions?query=workflow%3ATest)
Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), removing ignorable characters, replacing [confusable characters](https://unicode.org/Public/security/16.0.0/confusables.txt), and normalizing the string again.
Unicode version: **17.0.0** (September 2025)
\* The Unicode normalization [depends on your Ruby version](https://idiosyncratic-ruby.com/73-unicode-version-mapping.html)
Please note: The TR39 standard now includes detection of confusables based on bidi formatting (i.e. right-to-left text). This is currently not supported by this gen.
Supported Rubies: **3.x** (might stil work: **2.x**)
## Usage
### Confusable?
```ruby
require "unicode/confusable"
Unicode::Confusable.confusable? "a", "b" # => false
Unicode::Confusable.confusable? "C", "С" # => true
Unicode::Confusable.confusable? "ℜ𝘂ᖯʏ", "Ruby" # => true
Unicode::Confusable.confusable? "Michael", "Michae1" # => true
Unicode::Confusable.confusable? "⁇", "?" # => false
Unicode::Confusable.confusable? "⁇", "??" # => true
```
### Skeleton
```ruby
Unicode::Confusable.skeleton "ℜ𝘂ᖯʏ" # => "Ruby"
```
**Please note:** The skeleton is an intermediate representation, not meant for any other use than testing confusability, [according to the standard](https://www.unicode.org/reports/tr39/#Confusable_Detection).
### List
List all characters that map to the confusable exemplar given:
```ruby
Unicode::Confusable.list("o", false)
# => ["ం", "ಂ", "ം", "ං", "०", "০", "੦", "૦", "୦", "௦", "౦", "൦", "๐", "໐", "၀", "០", "𑓐", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "ϭ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬"]
```
If you omit the second parameter, it will also show confusables, where the given character is just a part of:
```ruby
Unicode::Confusable.list("o")
# => ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "০", "੦", "૦", "୦", "௦", "౦", "൦", "๐", "໐", "၀", "០", "𑓐", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "ϭ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬", "ۿ", "ø", "ꬾ", "ɵ", "ꝋ", "ⲑ", "ө", "ѳ", "ꮎ", "ꮻ", "ꭴ", "ﳙ", "ơ", "œ", "ɶ", "∞", "ꝏ", "ꚙ", "ﳗ", "ﱑ", "ﳘ", "ﱒ", "ﶓ", "ﶔ", "ﱓ", "ﱔ", "ൟ", "თ", "တ", "ꭣ", "ﲠ", "ﳢ", "ﲥ", "ﳤ", "ﷻ", "ﴱ", "ﳨ", "ﴲ", "ﳪ", "ﷺ", "ﷷ", "ﳍ", "ﳖ", "ﳯ", "ﳞ", "ﳱ", "ﳦ", "ﲛ", "ﳠ", "ﯭ", "ﯬ"]
```
## No Bidi-Confusable Check
Testing for bidirectional confusables is currently not supported.
## Single-script / Mixed-script / Whole-script
TR 39 also describes mechanisms for further categorization of confusables. This is currently not part of this gem, however the [unicode-scripts gem](https://github.com/janlelis/unicode-scripts) does include mixed-script detection, which you can use for this purpose.
See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
## MIT License
- Copyright (C) 2016-2025 Jan Lelis . Released under the MIT license.
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1