https://github.com/pythainlp/lekcut
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
https://github.com/pythainlp/lekcut
Last synced: 12 months ago
JSON representation
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
- Host: GitHub
- URL: https://github.com/pythainlp/lekcut
- Owner: PyThaiNLP
- License: apache-2.0
- Created: 2022-10-28T11:18:35.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-28T17:09:45.000Z (over 3 years ago)
- Last Synced: 2024-04-24T15:24:02.591Z (almost 2 years ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 4.56 MB
- Stars: 7
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
## Install
> pip install lekcut
## How to use
```python
from lekcut import word_tokenize
word_tokenize("ทดสอบการตัดคำ")
# output: ['ทดสอบ', 'การ', 'ตัด', 'คำ']
```
**API**
```python
word_tokenize(text: str, model: str="deepcut", path: str="default") -> List[str]
```
## Model
- ```deepcut``` - We ported deepcut model from tensorflow.keras to ONNX model. The model and code come from [Deepcut's Github](https://github.com/rkcosmos/deepcut). The model is [here](https://github.com/PyThaiNLP/LEKCut/blob/main/lekcut/model/deepcut.onnx).
### Load custom model
If you have trained your custom model from deepcut or other that LEKCut support, You can load the custom model by ```path``` in ```word_tokenize``` after porting your model.
- How to train custom model with your dataset by deepcut - [Notebook](https://github.com/rkcosmos/deepcut/blob/master/notebooks/training.ipynb) (Needs to update ```deepcut/train.py``` before train model)
## How to porting model?
See ```notebooks/```