An open API service indexing awesome lists of open source software.

https://github.com/pythainlp/lekcut

LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
https://github.com/pythainlp/lekcut

Last synced: 12 months ago
JSON representation

LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.

Awesome Lists containing this project

README

          

# LEKCut
pypi

LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.

## Install

> pip install lekcut

## How to use

```python
from lekcut import word_tokenize
word_tokenize("ทดสอบการตัดคำ")
# output: ['ทดสอบ', 'การ', 'ตัด', 'คำ']
```

**API**

```python
word_tokenize(text: str, model: str="deepcut", path: str="default") -> List[str]
```

## Model
- ```deepcut``` - We ported deepcut model from tensorflow.keras to ONNX model. The model and code come from [Deepcut's Github](https://github.com/rkcosmos/deepcut). The model is [here](https://github.com/PyThaiNLP/LEKCut/blob/main/lekcut/model/deepcut.onnx).

### Load custom model

If you have trained your custom model from deepcut or other that LEKCut support, You can load the custom model by ```path``` in ```word_tokenize``` after porting your model.

- How to train custom model with your dataset by deepcut - [Notebook](https://github.com/rkcosmos/deepcut/blob/master/notebooks/training.ipynb) (Needs to update ```deepcut/train.py``` before train model)

## How to porting model?

See ```notebooks/```