Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nitely/nim-segmentation
Unicode text segmentation (tr29)
https://github.com/nitely/nim-segmentation
nim text-segmentation unicode word-break
Last synced: 29 days ago
JSON representation
Unicode text segmentation (tr29)
- Host: GitHub
- URL: https://github.com/nitely/nim-segmentation
- Owner: nitely
- License: mit
- Created: 2020-02-15T13:31:56.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-19T00:39:29.000Z (5 months ago)
- Last Synced: 2024-11-10T06:42:25.856Z (3 months ago)
- Topics: nim, text-segmentation, unicode, word-break
- Language: Nim
- Homepage: https://nitely.github.io/nim-segmentation/
- Size: 40 KB
- Stars: 10
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Segmentation
[![licence](https://img.shields.io/github/license/nitely/nim-segmentation.svg?style=flat-square)](https://raw.githubusercontent.com/nitely/nim-segmentation/master/LICENSE)
An implementation of [Unicode Text Segmentation](https://unicode.org/reports/tr29/) (tr29). The splitting is made through a fast DFA.
> See [nim-graphemes](https://github.com/nitely/nim-graphemes) for grapheme cluster segmentation
## Install
```
nimble install segmentation
```# Compatibility
Nim 0.19, 0.20, +1.0.4
## Usage
```nim
import sequtils
import segmentationassert toSeq("The (“brown”) fox can’t jump 32.3 feet, right?".words) ==
@["The", " ", "(", "“", "brown", "”", ")", " ", "fox", " ",
"can’t", " ", "jump", " ", "32.3", " ", "feet", ",", " ",
"right", "?"]
```## Docs
[Read the docs](https://nitely.github.io/nim-segmentation/)
## Tests
```
nimble test
```## LICENSE
MIT