https://github.com/nitely/nim-segmentation
Unicode text segmentation (tr29)
https://github.com/nitely/nim-segmentation
nim text-segmentation unicode word-break
Last synced: 3 months ago
JSON representation
Unicode text segmentation (tr29)
- Host: GitHub
- URL: https://github.com/nitely/nim-segmentation
- Owner: nitely
- License: mit
- Created: 2020-02-15T13:31:56.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-19T00:39:29.000Z (10 months ago)
- Last Synced: 2025-03-23T18:37:26.971Z (3 months ago)
- Topics: nim, text-segmentation, unicode, word-break
- Language: Nim
- Homepage: https://nitely.github.io/nim-segmentation/
- Size: 40 KB
- Stars: 10
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Segmentation
[](https://raw.githubusercontent.com/nitely/nim-segmentation/master/LICENSE)
An implementation of [Unicode Text Segmentation](https://unicode.org/reports/tr29/) (tr29). The splitting is made through a fast DFA.
> See [nim-graphemes](https://github.com/nitely/nim-graphemes) for grapheme cluster segmentation
## Install
```
nimble install segmentation
```# Compatibility
Nim 0.19, 0.20, +1.0.4
## Usage
```nim
import sequtils
import segmentationassert toSeq("The (“brown”) fox can’t jump 32.3 feet, right?".words) ==
@["The", " ", "(", "“", "brown", "”", ")", " ", "fox", " ",
"can’t", " ", "jump", " ", "32.3", " ", "feet", ",", " ",
"right", "?"]
```## Docs
[Read the docs](https://nitely.github.io/nim-segmentation/)
## Tests
```
nimble test
```## LICENSE
MIT