Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nota/split-graphemes

Divide character strings into graphemes.
https://github.com/nota/split-graphemes

Last synced: about 2 months ago
JSON representation

Divide character strings into graphemes.

Awesome Lists containing this project

README

        

# split-graphemes

Divide ligature letters such as Thai, Khmer letters and complex emoji into array of [graphemes](https://en.wikipedia.org/wiki/Grapheme).
You can simply use this library instead of `Array.from` to get graphemes.

[![Tests](https://github.com/nota/split-graphemes/actions/workflows/test-node.yml/badge.svg?branch=master)](https://github.com/nota/split-graphemes/actions/workflows/test-node.yml)

## Installation
```
$ npm install split-graphemes
```

## Examples
### Emoji

```js
// An emoji '๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ' consists of 4 people face emoji joined by Zero Width Joiners (ZWJ).
const chars = Array.from('๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ') // ['๐Ÿ‘จ', ZWJ, '๐Ÿ‘ฉ', ZWJ, '๐Ÿ‘ฆ', ZWJ, '๐Ÿ‘ฆ']
```

```js
// It is interpreted exactly as one character!
const chars = splitGraphemes('๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ') // ['๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ']
```

### Khmer characters

```js
Array.from('แž”แŸ‰แžปแžŸแŸ’แžŠแžทแŸ') // ['แž”', 'แŸ‰', 'แžป', 'แžŸ', 'แŸ’', 'แžŠ', 'แžท', 'แŸ']
```

```js
splitGraphemes('แž”แŸ‰แžปแžŸแŸ’แžŠแžทแŸ') // ['แž”แŸ‰แžป', 'แžŸแŸ’แžŠแžทแŸ']
```

### Japanese NFD
```js
splitGraphemes('ใ“ใ‚™ใ‚“ใ‚™ใซใ‚™ใกใ‚™ใฏใ‚™') // ['ใ“ใ‚™', 'ใ‚“ใ‚™', 'ใซใ‚™', 'ใกใ‚™', 'ใฏใ‚™']
splitGraphemes('ใƒใ‚šใƒ’ใ‚šใƒ•ใ‚šใƒ˜ใ‚šใƒ›ใ‚š') // ['ใƒใ‚š', 'ใƒ’ใ‚š', 'ใƒ•ใ‚š', 'ใƒ˜ใ‚š', 'ใƒ›ใ‚š']
```

### English
```js
splitGraphemes('Hello') // ['H', 'e', 'l', 'l', 'o']
```

## Supported ligature characters
The list of characters is at [here](https://github.com/nota/split-graphemes/tree/master/src).
- [Emoji](https://en.wikipedia.org/wiki/Unicode_block)
- [Arabic](https://www.unicode.org/charts/PDF/U0600.pdf) and [Arabic supplement](https://www.unicode.org/charts/PDF/U0750.pdf)
- [Bengali](https://www.unicode.org/charts/PDF/U0980.pdf)
- [Devanagari](https://www.unicode.org/charts/PDF/U0900.pdf)
- [Gujarati](https://www.unicode.org/charts/PDF/U0A80.pdf)
- [Hebrew](https://www.unicode.org/charts/PDF/U0590.pdf)
- [Japanese Hiragana](https://www.unicode.org/charts/PDF/U3040.pdf) and [Katakana](https://www.unicode.org/charts/PDF/U30A0.pdf) NFD
- [Kannada](https://www.unicode.org/charts/PDF/U0C80.pdf)
- [Khmer](https://www.unicode.org/charts/PDF/U1780.pdf)
- [Lao](https://www.unicode.org/charts/PDF/U0E80.pdf)
- [Malayalam](https://unicode.org/charts/PDF/U0D00.pdf)
- [Myanmar](https://www.unicode.org/charts/PDF/U1000.pdf)
- [Tamil](https://www.unicode.org/charts/PDF/U0B80.pdf)
- [Telugu](https://www.unicode.org/charts/PDF/U0C00.pdf)
- [Thai](https://www.unicode.org/charts/PDF/U0E00.pdf)
- [Tibetan](https://www.unicode.org/charts/PDF/U0F00.pdf)