https://github.com/nitely/nim-segmentation

Unicode text segmentation (tr29)
https://github.com/nitely/nim-segmentation

nim text-segmentation unicode word-break

Last synced: 3 months ago
JSON representation

Unicode text segmentation (tr29)

Host: GitHub
URL: https://github.com/nitely/nim-segmentation
Owner: nitely
License: mit
Created: 2020-02-15T13:31:56.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-09-19T00:39:29.000Z (10 months ago)
Last Synced: 2025-03-23T18:37:26.971Z (3 months ago)
Topics: nim, text-segmentation, unicode, word-break
Language: Nim
Homepage: https://nitely.github.io/nim-segmentation/
Size: 40 KB
Stars: 10
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

        # Segmentation

[![licence](https://img.shields.io/github/license/nitely/nim-segmentation.svg?style=flat-square)](https://raw.githubusercontent.com/nitely/nim-segmentation/master/LICENSE)

An implementation of [Unicode Text Segmentation](https://unicode.org/reports/tr29/) (tr29). The splitting is made through a fast DFA.

> See [nim-graphemes](https://github.com/nitely/nim-graphemes) for grapheme cluster segmentation

## Install

```

nimble install segmentation

```

# Compatibility

Nim 0.19, 0.20, +1.0.4

## Usage

```nim

import sequtils

import segmentation

assert toSeq("The (“brown”) fox can’t jump 32.3 feet, right?".words) ==

  @["The", " ", "(", "“", "brown", "”", ")", " ", "fox", " ",

    "can’t", " ", "jump", " ", "32.3", " ", "feet", ",", " ",

    "right", "?"]

```

## Docs

[Read the docs](https://nitely.github.io/nim-segmentation/)

## Tests

```

nimble test

```

## LICENSE

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nitely/nim-segmentation

Awesome Lists containing this project

README