{"id":13631713,"url":"https://github.com/janlelis/unicode-display_width","last_synced_at":"2025-05-14T17:03:10.808Z","repository":{"id":1276893,"uuid":"1216212","full_name":"janlelis/unicode-display_width","owner":"janlelis","description":"Monospace Unicode character width in Ruby","archived":false,"fork":false,"pushed_at":"2024-09-13T10:17:22.000Z","size":1969,"stargazers_count":123,"open_issues_count":1,"forks_count":25,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-10-29T16:59:11.535Z","etag":null,"topics":["monospace-font","ruby","terminal","unicode","unicode-data"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/janlelis.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"MIT-LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2011-01-03T11:22:49.000Z","updated_at":"2024-10-21T13:12:56.000Z","dependencies_parsed_at":"2024-11-05T18:39:14.157Z","dependency_job_id":"44f4c4ab-bd59-4bee-82bf-fc393f5a0af6","html_url":"https://github.com/janlelis/unicode-display_width","commit_stats":{"total_commits":199,"total_committers":15,"mean_commits":"13.266666666666667","dds":"0.38190954773869346","last_synced_commit":"4447249aaaee69714cc1c5ed6e67afa5164674fb"},"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janlelis%2Funicode-display_width","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janlelis%2Funicode-display_width/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janlelis%2Funicode-display_width/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janlelis%2Funicode-display_width/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/janlelis","download_url":"https://codeload.github.com/janlelis/unicode-display_width/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244413451,"owners_count":20448709,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["monospace-font","ruby","terminal","unicode","unicode-data"],"created_at":"2024-08-01T22:02:35.439Z","updated_at":"2025-04-06T00:06:17.812Z","avatar_url":"https://github.com/janlelis.png","language":"Ruby","readme":"# Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [\u003cimg src=\"https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg\" /\u003e](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)\n\nDetermines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.\n\nUnicode version: **16.0.0** (September 2024)\n\n## Gem Version 3 — Improved Emoji Support\n\n**Emoji support is now enabled by default.** See below for description and configuration possibilities.\n\n**Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }\n\nSee [CHANGELOG](/CHANGELOG.md) for details.\n\n## Gem Version 2.4.2 — Performance Updates\n\n**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**\n\nThis is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.\n\n## Introduction to Character Widths\n\nGuessing the correct space a character will consume on terminals is not easy. There is no single standard. Most implementations combine data from [East Asian Width](https://www.unicode.org/reports/tr11/), some [General Categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category), and hand-picked adjustments.\n\n### How this Library Handles Widths\n\nFurther at the top means higher precedence. Please expect changes to this algorithm with every MINOR version update (the X in 1.X.0)!\n\nWidth  | Characters                   | Comment\n-------|------------------------------|--------------------------------------------------\n?      | (user defined)               | Overwrites any other values\n?      | Emoji                        | See \"How this Library Handles Emoji Width\" below\n-1     | `\"\\b\"`                       | Backspace (total width never below 0)\n0      | `\"\\0\"`, `\"\\x05\"`, `\"\\a\"`, `\"\\n\"`, `\"\\v\"`, `\"\\f\"`, `\"\\r\"`, `\"\\x0E\"`, `\"\\x0F\"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width\n1      | `\"\\u{00AD}\"`                 | SOFT HYPHEN\n2      | `\"\\u{2E3A}\"`                 | TWO-EM DASH\n3      | `\"\\u{2E3B}\"`                 | THREE-EM DASH\n0      | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters\n0      | Derived Property: Default_Ignorable_Code_Point     | Ignorable ranges\n0      | `\"\\u{1160}\"..\"\\u{11FF}\"`, `\"\\u{D7B0}\"..\"\\u{D7FF}\"` | HANGUL JUNGSEONG\n2      | East Asian Width: F, W       | Full-width characters\n2      | `\"\\u{3400}\"..\"\\u{4DBF}\"`, `\"\\u{4E00}\"..\"\\u{9FFF}\"`, `\"\\u{F900}\"..\"\\u{FAFF}\"`, `\"\\u{20000}\"..\"\\u{2FFFD}\"`, `\"\\u{30000}\"..\"\\u{3FFFD}\"` | Full-width ranges\n1 or 2 | East Asian Width: A          | Ambiguous characters, user defined, default: 1\n1      | All other codepoints         | -\n\n## Install\n\nInstall the gem with:\n\n    $ gem install unicode-display_width\n\nOr add to your Gemfile:\n\n    gem 'unicode-display_width'\n\n## Usage\n\n```ruby\nrequire 'unicode/display_width'\n\nUnicode::DisplayWidth.of(\"⚀\") # =\u003e 1\nUnicode::DisplayWidth.of(\"一\") # =\u003e 2\n```\n\n### Ambiguous Characters\n\nThe second parameter defines the value returned by characters defined as ambiguous:\n\n```ruby\nUnicode::DisplayWidth.of(\"·\", 1) # =\u003e 1\nUnicode::DisplayWidth.of(\"·\", 2) # =\u003e 2\n```\n\n### Encoding Notes\n\n- Data with *BINARY* encoding is interpreted as UTF-8, if possible\n- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)\n\n### Custom Overwrites\n\nYou can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:\n\n```ruby\nUnicode::DisplayWidth.of(\"a\\tb\", 1, overwrite: { \"\\t\".ord =\u003e 10 })) # =\u003e TAB counted as 10, result is 12\n```\n\nPlease note that using overwrites disables some perfomance optimizations of this gem.\n\n### Emoji\n\nIf your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:\n\n```ruby\nUnicode::DisplayWidth.of \"🤾🏽‍♀️\", emoji: :all # =\u003e 2\nUnicode::DisplayWidth.of \"🤾🏽‍♀️\", emoji: false # =\u003e 5\n```\n\n#### How this Library Handles Emoji Width\n\nThere are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.\n\nAnother aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).\n\nFinally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the \"Emoji Base\" property.\n\nEmoji Type  | Width / Comment\n------------|----------------\nBasic/Single Emoji character without Variation Selector   | No special handling\nBasic/Single Emoji character with VS15 (Text)             | No special handling\nBasic/Single Emoji character with VS16 (Emoji)            | 2 or East Asian Width (see table below)\nSingle Emoji character with Skin Tone Modifier            | 2 unless Emoji mode is `:none` or `vs16`\nSkin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is `:rgi` / `:rgi_at`\nEmoji Sequence                                            | 2 if Emoji belongs to configured Emoji set (see table below)\n\n#### Emoji Modes\n\nThe `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:\n\n`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals\n----------------|------------------|---------------------------------|------------------\n`true` or `:auto`  | - | Automatically use recommended Emoji setting for your terminal | -\n`:all`     | 2                | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot\n`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm\n`:possible`| 2                | 2 for all possible/well-formed Emoji sequences | ?\n`:rgi`     | 2                | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?\n`:rgi_at`  | EAW (1 or 2)     | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal\n`:vs16`    | 2                | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?\n`false` or  `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals\n\n- *EAW:* East Asian Width\n- *RGI Emoji:* Emoji Recommended for General Interchange\n- *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences\n\n#### Emoji Support in Terminals\n\nUnfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on \"Apple_Terminal\" or `false` on Gnome's terminal widget).\n\nPlease note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).\n\n**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…\n\n### Usage with String Extension\n\n```ruby\nrequire 'unicode/display_width/string_ext'\n\n\"⚀\".display_width # =\u003e 1\n'一'.display_width # =\u003e 2\n```\n\n### Usage with Config Object\n\nYou can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:\n\n```ruby\nrequire 'unicode/display_width'\n\ndisplay_width = Unicode::DisplayWidth.new(\n  # ambiguous: 1,\n  overwrite: { \"A\".ord =\u003e 100 },\n  emoji: :all,\n)\n\ndisplay_width.of \"⚀\" # =\u003e 1\ndisplay_width.of \"🤠‍🤢\" # =\u003e 2\ndisplay_width.of \"A\" # =\u003e 100\n```\n\n### Usage from the Command-Line\n\nUse this one-liner to print out display widths for strings from the command-line:\n\n```\n$ gem install unicode-display_width\n$ ruby -r unicode/display_width -e 'puts Unicode::DisplayWidth.of $*[0]' -- \"一\"\n```\nReplace \"一\" with the actual string to measure\n\n## Other Implementations \u0026 Discussion\n\n- Python: https://github.com/jquast/wcwidth\n- JavaScript: https://github.com/mycoboco/wcwidth.js\n- C: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c\n- C for Julia: https://github.com/JuliaLang/utf8proc/issues/2\n- Golang: https://github.com/rivo/uniseg\n\nSee [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.\n\n## Copyright \u0026 Info\n\n- Copyright (c) 2011, 2015-2024 Jan Lelis, https://janlelis.com, released under the MIT\nlicense\n- Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run\n- Unicode data: https://www.unicode.org/copyright.html#Exhibit1\n","funding_links":[],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanlelis%2Funicode-display_width","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjanlelis%2Funicode-display_width","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanlelis%2Funicode-display_width/lists"}