{"id":20457850,"url":"https://github.com/pythainlp/han-solo","last_synced_at":"2025-10-10T02:12:25.672Z","repository":{"id":184802587,"uuid":"672478146","full_name":"PyThaiNLP/Han-solo","owner":"PyThaiNLP","description":"🪿 Han-solo: Thai syllable segmenter","archived":false,"fork":false,"pushed_at":"2023-12-08T16:17:01.000Z","size":348,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-17T11:50:21.503Z","etag":null,"topics":["nlp","syllable-segmentation","thai-nlp","thai-nlp-library"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PyThaiNLP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-30T08:13:22.000Z","updated_at":"2024-01-15T08:46:32.000Z","dependencies_parsed_at":"2025-04-13T05:36:43.622Z","dependency_job_id":null,"html_url":"https://github.com/PyThaiNLP/Han-solo","commit_stats":null,"previous_names":["pythainlp/han-solo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PyThaiNLP/Han-solo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FHan-solo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FHan-solo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FHan-solo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FHan-solo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PyThaiNLP","download_url":"https://codeload.github.com/PyThaiNLP/Han-solo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FHan-solo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002530,"owners_count":26083399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","syllable-segmentation","thai-nlp","thai-nlp-library"],"created_at":"2024-11-15T12:09:29.265Z","updated_at":"2025-10-10T02:12:25.618Z","avatar_url":"https://github.com/PyThaiNLP.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🪿 Han-solo\n🪿 Han-solo: Thai syllable segmenter\n\nThis work wants to create a Thai syllable segmenter that can work in the Thai social media domain.\n\nDataset: [Han-solo: Thai syllable segmenter](https://zenodo.org/record/8196608)\n\nGoogle colab: [Demo](https://colab.research.google.com/github/pythainlp/Han-solo/blob/main/using.ipynb) \n\n\n## Dataset\n\nThis work uses 2 datasets:\n\n1. Nutcha Dataset (Thai news domain). See more data_nutcha/\n2. Han-solo: Thai syllable segmenter dataset (Thai social media domain). See more [Han-solo: Thai syllable segmenter](https://zenodo.org/record/8196608)\n\n## Model\n\nThis work uses the CRF model that uses the same feature from [ssg](https://github.com/ponrawee/ssg) to the training model.\n\nYou can see the training notebook from train.ipynb.\n\nThe model file: han_solo.crfsuite\n\n**F1-score**\n\n1 is split, and 0 is not split.\n\n```\n              precision    recall  f1-score   support\n\n           0       1.00      1.00      1.00     61078\n           1       1.00      0.99      0.99     29468\n\n    accuracy                           1.00     90546\n   macro avg       1.00      1.00      1.00     90546\nweighted avg       1.00      1.00      1.00     90546\n```\n\n## How to use?\n\n- See using.ipynb\n- PyThaiNLP v4.1+\n\n## License\n\n- CC-BY 4.0 license (for Dataset)\n- Apache License Version 2.0 (for Source code and model)\n\n## Cite as\n\n\u003e Wannaphong Phatthiyaphaibun. (2023). Han-solo: Thai syllable segmenter (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8196608\n\nor BibTeX entry:\n\n``` bib\n@dataset{wannaphong_phatthiyaphaibun_2023_8196608,\n  author       = {Wannaphong Phatthiyaphaibun},\n  title        = {Han-solo: Thai syllable segmenter},\n  month        = jul,\n  year         = 2023,\n  publisher    = {Zenodo},\n  version      = {1.0},\n  doi          = {10.5281/zenodo.8196608},\n  url          = {https://doi.org/10.5281/zenodo.8196608}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpythainlp%2Fhan-solo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpythainlp%2Fhan-solo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpythainlp%2Fhan-solo/lists"}