Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/seanghay/khmersegment

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
https://github.com/seanghay/khmersegment

cambodia crf crfpp khmer word-segmentation

Last synced: 30 days ago
JSON representation

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

Awesome Lists containing this project

README

        

## Khmer Segment

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

> [!IMPORTANT]
> `km-5tag-seg-model` is required for this script to work. This library doesn't provide the model file.

### Usage

```
pip install khmersegment
```

```python
from khmersegment import Segmenter

segmenter = Segmenter("-m km-5tag-seg-model")

print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=False))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នកណា', 'ទេ', '?']

print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=True))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នក', 'ណា', 'ទេ', '?']

```

### License

`Apache-2.0`

### Related

- [pycrfpp](https://github.com/seanghay/pycrfpp) Python binding for CRF++