Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nota/split-graphemes
Divide character strings into graphemes.
https://github.com/nota/split-graphemes
Last synced: about 2 months ago
JSON representation
Divide character strings into graphemes.
- Host: GitHub
- URL: https://github.com/nota/split-graphemes
- Owner: nota
- Created: 2018-05-18T04:38:05.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-02-03T08:58:07.000Z (almost 2 years ago)
- Last Synced: 2024-10-03T00:39:21.230Z (3 months ago)
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/split-graphemes
- Size: 472 KB
- Stars: 41
- Watchers: 5
- Forks: 6
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-khmer-language - nota/split-graphemes
README
# split-graphemes
Divide ligature letters such as Thai, Khmer letters and complex emoji into array of [graphemes](https://en.wikipedia.org/wiki/Grapheme).
You can simply use this library instead of `Array.from` to get graphemes.[![Tests](https://github.com/nota/split-graphemes/actions/workflows/test-node.yml/badge.svg?branch=master)](https://github.com/nota/split-graphemes/actions/workflows/test-node.yml)
## Installation
```
$ npm install split-graphemes
```## Examples
### Emoji```js
// An emoji '๐จโ๐ฉโ๐ฆโ๐ฆ' consists of 4 people face emoji joined by Zero Width Joiners (ZWJ).
const chars = Array.from('๐จโ๐ฉโ๐ฆโ๐ฆ') // ['๐จ', ZWJ, '๐ฉ', ZWJ, '๐ฆ', ZWJ, '๐ฆ']
``````js
// It is interpreted exactly as one character!
const chars = splitGraphemes('๐จโ๐ฉโ๐ฆโ๐ฆ') // ['๐จโ๐ฉโ๐ฆโ๐ฆ']
```### Khmer characters
```js
Array.from('แแแปแแแแทแ') // ['แ', 'แ', 'แป', 'แ', 'แ', 'แ', 'แท', 'แ']
``````js
splitGraphemes('แแแปแแแแทแ') // ['แแแป', 'แแแแทแ']
```### Japanese NFD
```js
splitGraphemes('ใใใใใซใใกใใฏใ') // ['ใใ', 'ใใ', 'ใซใ', 'ใกใ', 'ใฏใ']
splitGraphemes('ใใใใใใใใใใ') // ['ใใ', 'ใใ', 'ใใ', 'ใใ', 'ใใ']
```### English
```js
splitGraphemes('Hello') // ['H', 'e', 'l', 'l', 'o']
```## Supported ligature characters
The list of characters is at [here](https://github.com/nota/split-graphemes/tree/master/src).
- [Emoji](https://en.wikipedia.org/wiki/Unicode_block)
- [Arabic](https://www.unicode.org/charts/PDF/U0600.pdf) and [Arabic supplement](https://www.unicode.org/charts/PDF/U0750.pdf)
- [Bengali](https://www.unicode.org/charts/PDF/U0980.pdf)
- [Devanagari](https://www.unicode.org/charts/PDF/U0900.pdf)
- [Gujarati](https://www.unicode.org/charts/PDF/U0A80.pdf)
- [Hebrew](https://www.unicode.org/charts/PDF/U0590.pdf)
- [Japanese Hiragana](https://www.unicode.org/charts/PDF/U3040.pdf) and [Katakana](https://www.unicode.org/charts/PDF/U30A0.pdf) NFD
- [Kannada](https://www.unicode.org/charts/PDF/U0C80.pdf)
- [Khmer](https://www.unicode.org/charts/PDF/U1780.pdf)
- [Lao](https://www.unicode.org/charts/PDF/U0E80.pdf)
- [Malayalam](https://unicode.org/charts/PDF/U0D00.pdf)
- [Myanmar](https://www.unicode.org/charts/PDF/U1000.pdf)
- [Tamil](https://www.unicode.org/charts/PDF/U0B80.pdf)
- [Telugu](https://www.unicode.org/charts/PDF/U0C00.pdf)
- [Thai](https://www.unicode.org/charts/PDF/U0E00.pdf)
- [Tibetan](https://www.unicode.org/charts/PDF/U0F00.pdf)