Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/veer66/wordcutpy
A simple word breaker written in Python
https://github.com/veer66/wordcutpy
Last synced: 3 months ago
JSON representation
A simple word breaker written in Python
- Host: GitHub
- URL: https://github.com/veer66/wordcutpy
- Owner: veer66
- Archived: true
- Created: 2015-09-27T17:44:45.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2023-07-04T06:52:03.000Z (over 1 year ago)
- Last Synced: 2024-07-12T01:16:54.834Z (4 months ago)
- Language: Python
- Size: 343 KB
- Stars: 18
- Watchers: 2
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- nlp_thai_resources - GitHub
README
wordcutpy
=========
wordcutpy is a simple Thai word breaker written in Python 3+Installation
------------````
pip install wordcutpy
````Example
-------### Conventional verison
````python
#! -*- coding: UTF8 -*-
from wordcut import Wordcut
if __name__ == '__main__':
with open('bigthai.txt', encoding="UTF-8") as dict_file:
word_list = list(set([w.rstrip() for w in dict_file.readlines()]))
wordcut = Wordcut(word_list)
print(wordcut.tokenize("กากา cat หมา"))
````### Simplified version
````python
#! -*- coding: UTF8 -*-
from wordcut import Wordcut
wordcut = Wordcut.bigthai()
print(wordcut.tokenize("กากา cat หมา"))
````Test
----### Run tests
````shell
python -m unittest discover -s tests
````