{"id":15285846,"url":"https://github.com/silviucpp/eunicode2gsm","last_synced_at":"2025-07-21T09:32:42.292Z","repository":{"id":63763257,"uuid":"570517750","full_name":"silviucpp/eunicode2gsm","owner":"silviucpp","description":"Erlang library for unicode to gsm transliteration","archived":false,"fork":false,"pushed_at":"2025-03-04T15:07:03.000Z","size":12,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-06T19:27:02.579Z","etag":null,"topics":["erlang","gsm","transliteration","unicode"],"latest_commit_sha":null,"homepage":"","language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/silviucpp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-11-25T11:34:10.000Z","updated_at":"2025-03-04T15:06:47.000Z","dependencies_parsed_at":"2023-01-22T19:31:15.841Z","dependency_job_id":null,"html_url":"https://github.com/silviucpp/eunicode2gsm","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/silviucpp/eunicode2gsm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Feunicode2gsm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Feunicode2gsm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Feunicode2gsm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Feunicode2gsm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/silviucpp","download_url":"https://codeload.github.com/silviucpp/eunicode2gsm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Feunicode2gsm/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266276121,"owners_count":23903981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erlang","gsm","transliteration","unicode"],"created_at":"2024-09-30T15:07:49.938Z","updated_at":"2025-07-21T09:32:42.274Z","avatar_url":"https://github.com/silviucpp.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"# eunicode2gsm\n\n[![Build Status](https://app.travis-ci.com/silviucpp/eunicode2gsm.svg?branch=main)](https://travis-ci.com/github/silviucpp/unicode2gsm)\n[![GitHub](https://img.shields.io/github/license/silviucpp/eunicode2gsm)](https://github.com/silviucpp/unicode2gsm/blob/master/LICENSE)\n[![Hex.pm](https://img.shields.io/hexpm/v/eunicode2gsm)](https://hex.pm/packages/eunicode2gsm)\n\n## What it is ?\n\nA library that transliterates Unicode characters outside GSM alphabet with a similar GSM-encoded character. This helps ensure that your message gets segmented at 160 characters and saves you from sending multiple message segments, which increases your spend.\n\nWhen Unicode characters are used in an SMS message, they must be encoded as UCS-2. However, UCS-2 characters take 16 bits to encode, so when a message includes a Unicode character, it will be split or segmented between the 70th and 71st characters. This is shorter than the 160-character per message segment that you get with GSM-7 character encoding.\n\nFor example, sometimes a Unicode character such as a smart quote `〞`, a long dash `—`, or a Unicode whitespace accidentally slips into your carefully crafted 125-character message. Now, your message is segmented and priced at two messages instead of one.\n\nCurrently, the library transliterates symbols into the following Unicode Character Sets:\n\n|Start  |End    |Character Set                  |\n|:-----:|:-----:|:------------------------------|\n|0x0000 | 0x007F| Basic Latin                   |\n|0x0080 | 0x00FF| Latin-1 Supplement            |\n|0x0100 | 0x017F| Latin Extended-A              |\n|0x0180 | 0x024F| Latin Extended-B              |\n|0x0250 | 0x02AF| IPA Extensions                |\n|0x02B0 | 0x02FF| Spacing Modifier Letters      |\n|0x0300 | 0x036F| Combining Diacritical Marks   |\n|0x0370 | 0x03FF| Greek and Coptic              |\n|0x1D00 | 0x1D7F| Phonetic Extensions           |\n|0x1D80 | 0x1DBF| Phonetic Extensions Supplement|\n|0x1E00 | 0x1EFF| Latin Extended Additional     |\n|0x1F00 | 0x1FFF| Greek Extended                |\n|0x2000 | 0x206F| General Punctuation           |\n|0x2070 | 0x209F| Superscripts and Subscripts   |\n|0x20A0 | 0x20CF| Currency Symbols              |\n|0x2100 | 0x214F| Letterlike Symbols            |\n|0x2150 | 0x218F| Number Forms                  |\n|0x2190 | 0x21FF| Arrows                        |\n|0x2700 | 0x27BF| Dingbats                      |\n|0x27F0 | 0x27FF| Supplemental Arrows-A         |\n|0x2900 | 0x297F| Supplemental Arrows-B         |\n|0x2C60 | 0x2C7F| Latin Extended-C              |\n|0x3000 | 0x303F| CJK Symbols and Punctuation   |\n|0xFE10 | 0xFE1F| Vertical Forms                |\n|0xFE50 | 0xFE6F| Small Form Variants           |\n|0xFF00 | 0xFFEF| Halfwidth and Fullwidth Forms |\n\nAlso the transliteration process is replacing all the `CRLF` (`\\r\\n`) sequences into `\\n`.\n\n## Quick start\n\nGetting all deps and compile:\n\n```sh\nrebar3 compile\n```\n\nOptionally you can specify into `sys.config` if you want to transliterate also the extended GSM charset using `transliterate_extended_gsm_charset` option (`false` by default).\n\nExtended GSM charset symbols are escaped so each one will count as 2 characters. For the symbols that have a decent GSM-encoded replacement \nyou can optionally enable transliteration using transliterate_gsm_extended parameter and these symbols will be mapped as follows:\n\n- `{` -\u003e `(`\n- `}` -\u003e `)`\n- `[` -\u003e `(`\n- `]` -\u003e `)`\n- `~` -\u003e `-`\n\nExample:\n\n```erlang\n[\n    {eunicode2gsm, [\n        {transliterate_extended_gsm_charset, false}\n    ]}\n].\n```\n\n## API\n\nThe library accepts only utf8 encoded binaries. If you are not familiar with how erlang handle unicode please check:\n\n- https://imteemu.wordpress.com/2011/10/31/string-encodings-in-erlang/\n- https://adoptingerlang.org/docs/development/hard_to_get_right/\n\n### Check if a string requires transliteration\n\n```erlang\neunicode2gsm:requires_transliteration(\u003c\u003c\"utf8 binary here\"/utf8\u003e\u003e).\n```\n\n### Perform transliteration\n\n```erlang\neunicode2gsm:transliterate(\u003c\u003c\"utf8 binary here\"/utf8\u003e\u003e).\n```\n\n## Running tests\n\n```sh\nrebar3 ct\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsilviucpp%2Feunicode2gsm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsilviucpp%2Feunicode2gsm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsilviucpp%2Feunicode2gsm/lists"}