{"id":20457852,"url":"https://github.com/pythainlp/pythainlp","last_synced_at":"2025-05-13T19:03:21.592Z","repository":{"id":37821080,"uuid":"61813823","full_name":"PyThaiNLP/pythainlp","owner":"PyThaiNLP","description":"Thai natural language processing in Python","archived":false,"fork":false,"pushed_at":"2025-05-09T12:41:06.000Z","size":68734,"stargazers_count":1032,"open_issues_count":35,"forks_count":279,"subscribers_count":46,"default_branch":"dev","last_synced_at":"2025-05-12T19:15:22.765Z","etag":null,"topics":["computational-linguistics","hacktoberfest","natural-language-processing","nlp-library","python","soundex","text-processing","thai","thai-language","thai-nlp","thai-nlp-library","thai-soundex","word-segmentation"],"latest_commit_sha":null,"homepage":"https://pythainlp.org/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PyThaiNLP.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2016-06-23T14:57:26.000Z","updated_at":"2025-05-12T04:45:53.000Z","dependencies_parsed_at":"2023-02-18T23:45:59.128Z","dependency_job_id":"63d2de93-5cdd-4604-a3b3-f275a347fa47","html_url":"https://github.com/PyThaiNLP/pythainlp","commit_stats":{"total_commits":4067,"total_committers":78,"mean_commits":52.14102564102564,"dds":0.5148758298500122,"last_synced_commit":"e40f810e6f05df3a8eac5902f74bc41923f21b43"},"previous_names":["wannaphongcom/pythainlp"],"tags_count":127,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2Fpythainlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2Fpythainlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2Fpythainlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2Fpythainlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PyThaiNLP","download_url":"https://codeload.github.com/PyThaiNLP/pythainlp/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254010793,"owners_count":21998993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-linguistics","hacktoberfest","natural-language-processing","nlp-library","python","soundex","text-processing","thai","thai-language","thai-nlp","thai-nlp-library","thai-soundex","word-segmentation"],"created_at":"2024-11-15T12:09:29.335Z","updated_at":"2025-05-13T19:03:21.585Z","avatar_url":"https://github.com/PyThaiNLP.png","language":"Python","readme":"# PyThaiNLP: Thai Natural Language Processing in Python\n\n![Project Logo](https://avatars0.githubusercontent.com/u/32934255?s=200\u0026v=4)\n\n[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)\n[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3519354.svg)](https://doi.org/10.5281/zenodo.3519354)\n\n[![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![Codacy Grade](https://app.codacy.com/project/badge/Grade/5821a0de122041c79999bbb280230ffb)](https://www.codacy.com/gh/PyThaiNLP/pythainlp/dashboard?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=PyThaiNLP/pythainlp\u0026amp;utm_campaign=Badge_Grade)\n[![Coverage Status](https://coveralls.io/repos/github/PyThaiNLP/pythainlp/badge.svg?branch=dev)](https://coveralls.io/github/PyThaiNLP/pythainlp?branch=dev)\n\n[![Google Colab Badge](https://badgen.net/badge/Launch%20Quick%20Start%20Guide/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/PyThaiNLP/tutorials/blob/master/source/notebooks/pythainlp_get_started.ipynb)\n[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#thainlp:matrix.org)\n\nPyThaiNLP is a Python package for text processing and linguistic analysis, similar to [NLTK](https://www.nltk.org/) with a focus on Thai language.\n\nPyThaiNLP เป็นไลบารีภาษาไพทอนสำหรับประมวลผลภาษาธรรมชาติ คล้ายกับ NLTK โดยเน้นภาษาไทย [ดูรายละเอียดภาษาไทยได้ที่ README_TH.MD](https://github.com/PyThaiNLP/pythainlp/blob/dev/README_TH.md)\n\n## Quick install\n\n```sh\npip install pythainlp\n```\n\n| Version | Description | Status |\n|:------:|:--:|:------:|\n| [5.1.2](https://github.com/PyThaiNLP/pythainlp/releases) | Stable | [Change Log](https://github.com/PyThaiNLP/pythainlp/issues/900) |\n| [`dev`](https://github.com/PyThaiNLP/pythainlp/tree/dev) | Release Candidate for 5.2 | [Change Log](https://github.com/PyThaiNLP/pythainlp/issues/1080) |\n\n## Getting Started\n\n- PyThaiNLP requires Python 3.7+.\n  - Python 2.7 users can use PyThaiNLP 1.6. See [2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118) | [Upgrading from 1.7](https://pythainlp.org/docs/2.0/notes/pythainlp-1_7-2_0.html) | [Upgrading ThaiNER from 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)\n- [PyThaiNLP Get Started notebook](https://pythainlp.org/tutorials/notebooks/pythainlp_get_started.html) | [API document](https://pythainlp.org/docs) | [Tutorials](https://pythainlp.org/tutorials)\n- [Official website](https://pythainlp.org/) | [PyPI](https://pypi.org/project/pythainlp/) | [Facebook page](https://www.facebook.com/pythainlp/)\n- [Who uses PyThaiNLP?](https://github.com/PyThaiNLP/pythainlp/blob/dev/INTHEWILD.md)\n- [Model cards](https://github.com/PyThaiNLP/pythainlp/wiki/Model-Cards) - for technical details, caveats, and ethical considerations of the models developed and used in PyThaiNLP\n\n## Capabilities\n\nPyThaiNLP provides standard linguistic analysis for Thai language and standard Thai locale utility functions.\nSome of these functions are also available via the command-line interface (run `thainlp` in your shell).\n\nPartial list of features:\n\n- Convenient character and word classes, like Thai consonants (`pythainlp.thai_consonants`), vowels (`pythainlp.thai_vowels`), digits (`pythainlp.thai_digits`), and stop words (`pythainlp.corpus.thai_stopwords`) -- comparable to constants like `string.letters`, `string.digits`, and `string.punctuation`\n- Linguistic unit segmentation at different levels: sentence (`sent_tokenize`), word (`word_tokenize`), and subword (`subword_tokenize`)\n- Part-of-speech tagging (`pos_tag`)\n- Spelling suggestion and correction (`spell` and `correct`)\n- Phonetic algorithm and transliteration (`soundex`  and `transliterate`)\n- Collation (sorted by dictionary order) (`collate`)\n- Number read out (`num_to_thaiword` and `bahttext`)\n- Datetime formatting (`thai_strftime`)\n- Thai-English keyboard misswitched fix (`eng_to_thai`, `thai_to_eng`)\n\n## Installation\n\n```sh\npip install --upgrade pythainlp\n```\n\nThis will install the latest stable release of PyThaiNLP.\n\nInstall different releases:\n\n- Stable release: `pip install --upgrade pythainlp`\n- Pre-release (nearly ready): `pip install --upgrade --pre pythainlp`\n- Development (likely to break things): `pip install https://github.com/PyThaiNLP/pythainlp/archive/dev.zip`\n\n### Installation Options\n\nSome functionalities, like Thai WordNet, may require extra packages. To install those requirements, specify a set of `[name]` immediately after `pythainlp`:\n\n```sh\npip install \"pythainlp[extra1,extra2,...]\"\n```\n\nPossible `extras`:\n\n- `full` (install everything)\n- `compact` (install a stable and small subset of dependencies)\n- `attacut` (to support attacut, a fast and accurate tokenizer)\n- `benchmarks` (for [word tokenization benchmarking](tokenization-benchmark.md))\n- `icu` (for ICU, International Components for Unicode, support in transliteration and tokenization)\n- `ipa` (for IPA, International Phonetic Alphabet, support in transliteration)\n- `ml` (to support ULMFiT models for classification)\n- `thai2fit` (for Thai word vector)\n- `thai2rom` (for machine-learnt romanization)\n- `wordnet` (for Thai WordNet API)\n\nFor dependency details, look at the `extras` variable in\n[`setup.py`](https://github.com/PyThaiNLP/pythainlp/blob/dev/setup.py).\n\n## Data Directory\n\n- Some additional data, like word lists and language models, may be automatically downloaded during runtime.\n- PyThaiNLP caches these data under the directory `~/pythainlp-data` by default.\n- The data directory can be changed by specifying the environment variable `PYTHAINLP_DATA_DIR`.\n- See the data catalog (`db.json`) at \u003chttps://github.com/PyThaiNLP/pythainlp-corpus\u003e\n\n## Command-Line Interface\n\nSome of PyThaiNLP functionalities can be used via command line with the `thainlp` command.\n\nFor example, to display a catalog of datasets:\n\n```sh\nthainlp data catalog\n```\n\nTo show how to use:\n\n```sh\nthainlp help\n```\n\n## Testing and test suites\n\nWe test core functionalities on all officially supported Python versions.\n\nSome functionality requiring extra dependencies may be tested less frequently\ndue to potential version conflicts or incompatibilities between packages.\n\nTest cases are categorized into three groups: core, compact, and extra.\nYou can find these tests in the [tests/](/tests/) directory.\n\nFor more detailed information on testing, please refer to the tests README:\n[tests/README.md](./tests/README.md)\n\n## Licenses\n\n| | License |\n|:---|:----|\n| PyThaiNLP source codes and notebooks | [Apache Software License 2.0](https://github.com/PyThaiNLP/pythainlp/blob/dev/LICENSE) |\n| Corpora, datasets, and documentations created by PyThaiNLP | [Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0)](https://creativecommons.org/publicdomain/zero/1.0/)|\n| Language models created by PyThaiNLP | [Creative Commons Attribution 4.0 International Public License (CC-by)](https://creativecommons.org/licenses/by/4.0/)  |\n| Other corpora and models that may be included in PyThaiNLP | See [Corpus License](https://github.com/PyThaiNLP/pythainlp/blob/dev/pythainlp/corpus/corpus_license.md) |\n\n## Contribute to PyThaiNLP\n\n- Please fork and create a pull request :)\n- For style guides and other information, including references to algorithms we use,\n  please refer to our [contributing](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md) page.\n\n## Who uses PyThaiNLP?\n\nYou can read [INTHEWILD.md](https://github.com/PyThaiNLP/pythainlp/blob/dev/INTHEWILD.md).\n\n## Citations\n\nIf you use `PyThaiNLP` in your project or publication,\nplease cite the library as follows:\n\n\u003e Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. “Pythainlp: Thai Natural Language Processing in Python”. Zenodo, 2 June 2024. \u003chttp://doi.org/10.5281/zenodo.3519354\u003e.\n\nor by BibTeX entry:\n\n```bibtex\n@software{pythainlp,\n    title = \"{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython\",\n    author = \"Phatthiyaphaibun, Wannaphong  and\n      Chaovavanich, Korakot  and\n      Polpanumas, Charin  and\n      Suriyawongkul, Arthit  and\n      Lowphansirikul, Lalita  and\n      Chormai, Pattarawat\",\n    doi = {10.5281/zenodo.3519354},\n    license = {Apache-2.0},\n    month = jun,\n    url = {https://github.com/PyThaiNLP/pythainlp/},\n    version = {v5.0.4},\n    year = {2024},\n}\n```\n\nOur [NLP-OSS 2023](https://nlposs.github.io/2023/) paper:\n\n\u003e Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023. [PyThaiNLP: Thai Natural Language Processing in Python.](https://aclanthology.org/2023.nlposs-1.4) In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing.\n\nand its BibTeX entry:\n\n```bibtex\n@inproceedings{phatthiyaphaibun-etal-2023-pythainlp,\n    title = \"{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython\",\n    author = \"Phatthiyaphaibun, Wannaphong  and\n      Chaovavanich, Korakot  and\n      Polpanumas, Charin  and\n      Suriyawongkul, Arthit  and\n      Lowphansirikul, Lalita  and\n      Chormai, Pattarawat  and\n      Limkonchotiwat, Peerat  and\n      Suntorntip, Thanathip  and\n      Udomcharoenchaikit, Can\",\n    editor = \"Tan, Liling  and\n      Milajevs, Dmitrijs  and\n      Chauhan, Geeticka  and\n      Gwinnup, Jeremy  and\n      Rippeth, Elijah\",\n    booktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n    month = dec,\n    year = \"2023\",\n    address = \"Singapore, Singapore\",\n    publisher = \"Empirical Methods in Natural Language Processing\",\n    url = \"https://aclanthology.org/2023.nlposs-1.4\",\n    pages = \"25--36\",\n    abstract = \"We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.\",\n}\n```\n\n## Sponsors\n\n| Logo | Description |\n| --- | ----------- |\n| [![VISTEC-depa Thailand Artificial Intelligence Research Institute](https://airesearch.in.th/assets/img/logo/airesearch-logo.svg)](https://airesearch.in.th/)   | Since 2019, our contributors Korakot Chaovavanich and Lalita Lowphansirikul have been supported by [VISTEC-depa Thailand Artificial Intelligence Research Institute](https://airesearch.in.th/).                 |\n| [![MacStadium](https://i.imgur.com/rKy1dJX.png)](https://www.macstadium.com)   | We get support of free Mac Mini M1 from [MacStadium](https://www.macstadium.com) for running CI builds.                  |\n\n------\n\n\u003cdiv align=\"center\"\u003e\n  Made with ❤️ | PyThaiNLP Team 💻 |  \"We build Thai NLP\" 🇹🇭\n\u003c/div\u003e\n\n------\n\n\u003cdiv align=\"center\"\u003e\n  \u003cstrong\u003eWe have only one official repository at https://github.com/PyThaiNLP/pythainlp and another mirror at https://gitlab.com/pythainlp/pythainlp\u003c/strong\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cstrong\u003eBeware of malware if you use codes from mirrors other than the official two on GitHub and GitLab.\u003c/strong\u003e\n\u003c/div\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpythainlp%2Fpythainlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpythainlp%2Fpythainlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpythainlp%2Fpythainlp/lists"}