https://github.com/atomicobject/utf8_to_gsm
`Utf8ToGsm` provides functionality to convert UTF-8 characters to their GSM equivalents.
https://github.com/atomicobject/utf8_to_gsm
Last synced: 10 months ago
JSON representation
`Utf8ToGsm` provides functionality to convert UTF-8 characters to their GSM equivalents.
- Host: GitHub
- URL: https://github.com/atomicobject/utf8_to_gsm
- Owner: atomicobject
- License: mit
- Created: 2011-08-18T21:27:21.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2011-08-19T15:40:47.000Z (over 14 years ago)
- Last Synced: 2025-06-08T22:08:12.709Z (11 months ago)
- Language: Ruby
- Homepage: https://github.com/kuleszaj/utf8_to_gsm/
- Size: 104 KB
- Stars: 6
- Watchers: 25
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Description
===========
`Utf8ToGsm.to_gsm` provides functionality to convert UTF-8 characters (in a string) to their GSM equivalents for sending SMS messages via SMPP.
Examples
--------
require 'utf8_to_gsm'
Utf8ToGsm.to_gsm('Convert to GSM: !@#$%^&*()')
=> "Convert to GSM: !\x00#\x02%\e\x14&*()"
Usage
-----
Provide `Utf8ToGsm.to_gsm` a UTF-8 string that you would like to convert into a GSM-compatible string.
`Utf8ToGsm` will go through each character in the string:
* If the character has an exact GSM equivalent, it will be used.
* Otherwise, the UTF-8 character is transliterated to ASCII.
* If no suitable character(s) is available in ASCII, a replacement symbol (question mark: ?) will be used.
* Once transliterated to ASCII, the character(s) will be converted to its GSM equivalent. (All ASCII characters are represented in GSM.)
Implementation
--------------
Any given UTF-8 character(s) that does not exist in the GSM alphabet is transliterated with the help of `unidecoder` to ASCII.
`unidecoder` is used so that `Utf8ToGsm` can work with Ruby 1.8.7. Much of the functionality of `unidecoder` is provided by Ruby 1.9.2. However... the need at the time of writing was Ruby 1.8.7.
Motivation
----------
* `Utf8ToGsm` may be useful for people who need to send SMS messages via SMPP directly to an SMSC using the GSM-7 encoding ("Default SMSC Alphabet"), data_coding = 0x00.
* Transliteration used by this library is meant to provide the best possible ASCII replacement that is available for the given UTF-8 characters. It may be helpful to review the readme from [unicoder](https://github.com/norman/unidecoder/blob/master/README.md).
* Clearly, transliteration is not ideal. However, the GSM-7 default alphabet ("Default SMSC Alphabet") only allows a total of 127 characters, and so a very limited character repertoire is available.
* It is presumed that providing the closest possible replacement is better than providing nothing at all.
* For example, if a user tries to send an SMS message via SMPP containing the character "À", there is a problem. "À" does not exist in the GSM-7 default alphabet. Sending "A" as a replacement instead of "?" is probably more helpful to the recipient.
* For a truly accurate representation, UTF-16 or UCS-2 should generally be used for transmitting the payload of an SMPP PDU to the SMSC when non-GSM characters are being communicated. However, not all telcos or SMSC's support UTF-16/UCS-16.
* Theoretically, GSM locking shift tables and GSM single shift tables should be usable to represent characters outside of the GSM-7 default alphabet. However, it seems that telco support for this (especially via SMPP) is very limited.
Authors
=======
* Justin Kulesza (kulesza@atomicobject.com)
© 2011 [Atomic Object](http://www.atomicobject.com/)
More Atomic Object [open source](http://www.atomicobject.com/pages/Software+Commons) projects