https://github.com/chearon/word-breaker

Unicode word boundary algorithm from UAX29 section 4
https://github.com/chearon/word-breaker

Last synced: over 1 year ago
JSON representation

Unicode word boundary algorithm from UAX29 section 4

Host: GitHub
URL: https://github.com/chearon/word-breaker
Owner: chearon
Created: 2019-08-29T04:08:46.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2019-09-08T18:37:05.000Z (almost 7 years ago)
Last Synced: 2025-03-17T19:52:14.853Z (over 1 year ago)
Language: JavaScript
Homepage:
Size: 11.7 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # word-breaker

Implementation of the Unicode Word Boundary Rules algorithm (UAX29 4.1). At time of writing it targets **Unicode 12**.

What are word boundaries used for?

* When you double click a word inside your web browser, UAX29 sec 4 defines where the start and end of the selection should be

* CSS's text-transform: uppercase

* Can be used for search algorithms too

It will keep together grapheme clusters, like emojis with skin tones or diacritical marks like a grave accent. It passes all 613 tests from the [Unicode auxillary files](https://www.unicode.org/Public/UCD/latest/ucd/auxiliary/WordBreakTest.html#samples) for word breaks.

## API

```javascript

const WordBreaker = require('word-breaker');

const string = 'UAX29 has rules like   WB4\t👌🏼';

const wb = new WordBreaker(string);

let last = null;

let i;

while ((i = wb.nextBreak()) !== null) {

  if (last !== null) console.log(string.slice(last, i));

  last = i;

}

// output:

// UAX29

// _

// has

// _

// rules

// _

// like

// ___

// WB4

// \t

// 👌🏼

```

## More info

Inspired by [foliojs/grapheme-breaker](https://github.com/foliojs/grapheme-breaker) which comes from the same specification,  and [foliojs/linebreak](https://github.com/foliojs/linebreak). It uses the same project structure as well as [unicode-trie](https://github.com/foliojs/unicode-trie) for character classification.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chearon/word-breaker

Awesome Lists containing this project

README