https://github.com/nota/split-graphemes

Divide character strings into graphemes.
https://github.com/nota/split-graphemes

Last synced: 2 months ago
JSON representation

Divide character strings into graphemes.

Host: GitHub
URL: https://github.com/nota/split-graphemes
Owner: nota
Created: 2018-05-18T04:38:05.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2023-02-03T08:58:07.000Z (over 2 years ago)
Last Synced: 2025-05-13T20:21:58.612Z (2 months ago)
Language: JavaScript
Homepage: https://www.npmjs.com/package/split-graphemes
Size: 472 KB
Stars: 42
Watchers: 4
Forks: 6
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-khmer-language - nota/split-graphemes

README

        # split-graphemes

Divide ligature letters such as Thai, Khmer letters and complex emoji into array of [graphemes](https://en.wikipedia.org/wiki/Grapheme).

You can simply use this library instead of `Array.from` to get graphemes.

[![Tests](https://github.com/nota/split-graphemes/actions/workflows/test-node.yml/badge.svg?branch=master)](https://github.com/nota/split-graphemes/actions/workflows/test-node.yml)

## Installation

```

$ npm install split-graphemes

```

## Examples

### Emoji

```js

// An emoji '👨‍👩‍👦‍👦' consists of 4 people face emoji joined by Zero Width Joiners (ZWJ).

const chars = Array.from('👨‍👩‍👦‍👦') // ['👨', ZWJ, '👩', ZWJ, '👦', ZWJ, '👦']

```

```js

// It is interpreted exactly as one character!

const chars = splitGraphemes('👨‍👩‍👦‍👦') // ['👨‍👩‍👦‍👦']

```

### Khmer characters

```js

Array.from('ប៉ុស្ដិ៍') // ['ប', '៉', 'ុ', 'ស', '្', 'ដ', 'ិ', '៍']

```

```js

splitGraphemes('ប៉ុស្ដិ៍') // ['ប៉ុ', 'ស្ដិ៍']

```

### Japanese NFD

```js

splitGraphemes('ごん゙に゙ぢば') // ['ご', 'ん゙', 'に゙', 'ぢ', 'ば']

splitGraphemes('パピプペポ') // ['パ', 'ピ', 'プ', 'ペ', 'ポ']

```

### English

```js

splitGraphemes('Hello') // ['H', 'e', 'l', 'l', 'o']

```

## Supported ligature characters

The list of characters is at [here](https://github.com/nota/split-graphemes/tree/master/src).

- [Emoji](https://en.wikipedia.org/wiki/Unicode_block)

- [Arabic](https://www.unicode.org/charts/PDF/U0600.pdf) and [Arabic supplement](https://www.unicode.org/charts/PDF/U0750.pdf)

- [Bengali](https://www.unicode.org/charts/PDF/U0980.pdf)

- [Devanagari](https://www.unicode.org/charts/PDF/U0900.pdf)

- [Gujarati](https://www.unicode.org/charts/PDF/U0A80.pdf)

- [Hebrew](https://www.unicode.org/charts/PDF/U0590.pdf)

- [Japanese Hiragana](https://www.unicode.org/charts/PDF/U3040.pdf) and [Katakana](https://www.unicode.org/charts/PDF/U30A0.pdf) NFD

- [Kannada](https://www.unicode.org/charts/PDF/U0C80.pdf)

- [Khmer](https://www.unicode.org/charts/PDF/U1780.pdf)

- [Lao](https://www.unicode.org/charts/PDF/U0E80.pdf)

- [Malayalam](https://unicode.org/charts/PDF/U0D00.pdf)

- [Myanmar](https://www.unicode.org/charts/PDF/U1000.pdf)

- [Tamil](https://www.unicode.org/charts/PDF/U0B80.pdf)

- [Telugu](https://www.unicode.org/charts/PDF/U0C00.pdf)

- [Thai](https://www.unicode.org/charts/PDF/U0E00.pdf)

- [Tibetan](https://www.unicode.org/charts/PDF/U0F00.pdf)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nota/split-graphemes

Awesome Lists containing this project

README