https://github.com/pythainlp/han-solo
🪿 Han-solo: Thai syllable segmenter
https://github.com/pythainlp/han-solo
nlp syllable-segmentation thai-nlp thai-nlp-library
Last synced: 4 months ago
JSON representation
🪿 Han-solo: Thai syllable segmenter
- Host: GitHub
- URL: https://github.com/pythainlp/han-solo
- Owner: PyThaiNLP
- License: apache-2.0
- Created: 2023-07-30T08:13:22.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-08T16:17:01.000Z (almost 2 years ago)
- Last Synced: 2025-03-26T22:11:12.441Z (7 months ago)
- Topics: nlp, syllable-segmentation, thai-nlp, thai-nlp-library
- Language: Jupyter Notebook
- Homepage:
- Size: 340 KB
- Stars: 9
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🪿 Han-solo
🪿 Han-solo: Thai syllable segmenterThis work wants to create a Thai syllable segmenter that can work in the Thai social media domain.
Dataset: [Han-solo: Thai syllable segmenter](https://zenodo.org/record/8196608)
Google colab: [Demo](https://colab.research.google.com/github/pythainlp/Han-solo/blob/main/using.ipynb)
## Dataset
This work uses 2 datasets:
1. Nutcha Dataset (Thai news domain). See more data_nutcha/
2. Han-solo: Thai syllable segmenter dataset (Thai social media domain). See more [Han-solo: Thai syllable segmenter](https://zenodo.org/record/8196608)## Model
This work uses the CRF model that uses the same feature from [ssg](https://github.com/ponrawee/ssg) to the training model.
You can see the training notebook from train.ipynb.
The model file: han_solo.crfsuite
**F1-score**
1 is split, and 0 is not split.
```
precision recall f1-score support0 1.00 1.00 1.00 61078
1 1.00 0.99 0.99 29468accuracy 1.00 90546
macro avg 1.00 1.00 1.00 90546
weighted avg 1.00 1.00 1.00 90546
```## How to use?
- See using.ipynb
- PyThaiNLP v4.1+## License
- CC-BY 4.0 license (for Dataset)
- Apache License Version 2.0 (for Source code and model)## Cite as
> Wannaphong Phatthiyaphaibun. (2023). Han-solo: Thai syllable segmenter (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8196608
or BibTeX entry:
``` bib
@dataset{wannaphong_phatthiyaphaibun_2023_8196608,
author = {Wannaphong Phatthiyaphaibun},
title = {Han-solo: Thai syllable segmenter},
month = jul,
year = 2023,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.8196608},
url = {https://doi.org/10.5281/zenodo.8196608}
}
```