Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seanghay/khmersegment
A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
https://github.com/seanghay/khmersegment
cambodia crf crfpp khmer word-segmentation
Last synced: 2 days ago
JSON representation
A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
- Host: GitHub
- URL: https://github.com/seanghay/khmersegment
- Owner: seanghay
- License: apache-2.0
- Created: 2024-05-22T15:56:17.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-05-22T16:30:08.000Z (6 months ago)
- Last Synced: 2024-05-22T16:32:18.870Z (6 months ago)
- Topics: cambodia, crf, crfpp, khmer, word-segmentation
- Language: Python
- Homepage: https://pypi.org/project/khmersegment/
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-khmer-language - seanghay/khmersegment
README
## Khmer Segment
A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
> [!IMPORTANT]
> `km-5tag-seg-model` is required for this script to work. This library doesn't provide the model file.### Usage
```
pip install khmersegment
``````python
from khmersegment import Segmentersegmenter = Segmenter("-m km-5tag-seg-model")
print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=False))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នកណា', 'ទេ', '?']print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=True))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នក', 'ណា', 'ទេ', '?']```
### License
`Apache-2.0`
### Related
- [pycrfpp](https://github.com/seanghay/pycrfpp) Python binding for CRF++