{"id":15014204,"url":"https://github.com/pythainlp/spacy-pythainlp","last_synced_at":"2026-02-05T11:03:51.563Z","repository":{"id":65145448,"uuid":"583573664","full_name":"PyThaiNLP/spaCy-PyThaiNLP","owner":"PyThaiNLP","description":"PyThaiNLP For spaCy","archived":false,"fork":false,"pushed_at":"2023-01-04T07:08:34.000Z","size":47,"stargazers_count":16,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-29T16:11:20.431Z","etag":null,"topics":["nlp-library","python","spacy","spacy-extensions"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PyThaiNLP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-12-30T07:30:47.000Z","updated_at":"2025-01-30T05:12:05.000Z","dependencies_parsed_at":"2023-02-02T02:30:43.394Z","dependency_job_id":null,"html_url":"https://github.com/PyThaiNLP/spaCy-PyThaiNLP","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/PyThaiNLP/spaCy-PyThaiNLP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FspaCy-PyThaiNLP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FspaCy-PyThaiNLP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FspaCy-PyThaiNLP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FspaCy-PyThaiNLP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PyThaiNLP","download_url":"https://codeload.github.com/PyThaiNLP/spaCy-PyThaiNLP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PyThaiNLP%2FspaCy-PyThaiNLP/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29120483,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-05T10:47:47.471Z","status":"ssl_error","status_checked_at":"2026-02-05T10:45:08.119Z","response_time":65,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp-library","python","spacy","spacy-extensions"],"created_at":"2024-09-24T19:45:19.216Z","updated_at":"2026-02-05T11:03:51.550Z","avatar_url":"https://github.com/PyThaiNLP.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spaCy-PyThaiNLP\n\n[![PyPI version](https://img.shields.io/pypi/v/spacy-pythainlp.svg)](https://pypi.org/project/spacy-pythainlp/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n\nThis package wraps the [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) library to add Thai language support for [spaCy](https://spacy.io/).\n\n## Features\n\n**Support List**\n- Word segmentation (tokenization)\n- Part-of-speech tagging\n- Named entity recognition (NER)\n- Sentence segmentation\n- Dependency parsing\n- Word vectors\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Usage Examples](#usage-examples)\n  - [Basic Sentence Segmentation](#basic-sentence-segmentation)\n  - [Part-of-Speech Tagging](#part-of-speech-tagging)\n  - [Named Entity Recognition](#named-entity-recognition)\n  - [Dependency Parsing](#dependency-parsing)\n  - [Word Vectors](#word-vectors)\n- [Configuration](#configuration)\n- [License](#license)\n\n## Installation\n\n### Prerequisites\n\n- Python 3.9 or higher\n- spaCy 3.0 or higher\n- PyThaiNLP 3.1.0 or higher\n\n### Install via pip\n\n```bash\npip install spacy-pythainlp\n```\n\n## Quick Start\n\n```python\nimport spacy\nimport spacy_pythainlp.core\n\n# Create a blank Thai language model\nnlp = spacy.blank(\"th\")\n\n# Add the PyThaiNLP pipeline component\nnlp.add_pipe(\"pythainlp\")\n\n# Process text\ndoc = nlp(\"ผมเป็นคนไทย แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน ผมอยากไปเที่ยว\")\n\n# Access sentences\nfor sent in doc.sents:\n    print(sent)\n# Output:\n# ผมเป็นคนไทย แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน\n# ผมอยากไปเที่ยว\n```\n\n## Usage Examples\n\n### Basic Sentence Segmentation\n\n```python\nimport spacy\nimport spacy_pythainlp.core\n\nnlp = spacy.blank(\"th\")\nnlp.add_pipe(\"pythainlp\")\n\ndoc = nlp(\"ผมเป็นคนไทย แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน ผมอยากไปเที่ยว\")\n\n# Get sentences\nsentences = list(doc.sents)\nprint(f\"Number of sentences: {len(sentences)}\")\nfor i, sent in enumerate(sentences, 1):\n    print(f\"Sentence {i}: {sent.text}\")\n```\n\n### Part-of-Speech Tagging\n\n```python\nimport spacy\nimport spacy_pythainlp.core\n\nnlp = spacy.blank(\"th\")\nnlp.add_pipe(\"pythainlp\", config={\"pos\": True})\n\ndoc = nlp(\"ผมเป็นคนไทย\")\n\n# Print tokens with POS tags\nfor token in doc:\n    print(f\"{token.text}: {token.pos_}\")\n```\n\n### Named Entity Recognition\n\n```python\nimport spacy\nimport spacy_pythainlp.core\n\nnlp = spacy.blank(\"th\")\nnlp.add_pipe(\"pythainlp\", config={\"ner\": True})\n\ndoc = nlp(\"วันที่ 15 กันยายน 2564 ทดสอบระบบที่กรุงเทพ\")\n\n# Print named entities\nfor ent in doc.ents:\n    print(f\"{ent.text}: {ent.label_}\")\n```\n\n### Dependency Parsing\n\n```python\nimport spacy\nimport spacy_pythainlp.core\n\nnlp = spacy.blank(\"th\")\nnlp.add_pipe(\"pythainlp\", config={\"dependency_parsing\": True})\n\ndoc = nlp(\"ผมเป็นคนไทย\")\n\n# Print dependency relations\nfor token in doc:\n    print(f\"{token.text}: {token.dep_} \u003c- {token.head.text}\")\n```\n\n### Word Vectors\n\n```python\nimport spacy\nimport spacy_pythainlp.core\n\nnlp = spacy.blank(\"th\")\nnlp.add_pipe(\"pythainlp\", config={\"word_vector\": True, \"word_vector_model\": \"thai2fit_wv\"})\n\ndoc = nlp(\"แมว สุนัข\")\n\n# Access word vectors\nfor token in doc:\n    print(f\"{token.text}: vector shape = {token.vector.shape}\")\n    \n# Calculate similarity\ntoken1 = doc[0]  # แมว\ntoken2 = doc[1]  # สุนัข\nprint(f\"Similarity: {token1.similarity(token2)}\")\n```\n\n## Configuration\n\nYou can customize the PyThaiNLP pipeline component by passing a configuration dictionary to `nlp.add_pipe()`:\n\n```python\nnlp.add_pipe(\n    \"pythainlp\",\n    config={\n        \"pos_engine\": \"perceptron\",\n        \"pos\": True,\n        \"pos_corpus\": \"orchid_ud\",\n        \"sent_engine\": \"crfcut\",\n        \"sent\": True,\n        \"ner_engine\": \"thainer\",\n        \"ner\": True,\n        \"tokenize_engine\": \"newmm\",\n        \"tokenize\": False,\n        \"dependency_parsing\": False,\n        \"dependency_parsing_engine\": \"esupar\",\n        \"dependency_parsing_model\": None,\n        \"word_vector\": True,\n        \"word_vector_model\": \"thai2fit_wv\"\n    }\n)\n```\n\n### Configuration Options\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `tokenize` | `bool` | `False` | Enable/disable word tokenization (spaCy uses PyThaiNLP's newmm by default) |\n| `tokenize_engine` | `str` | `\"newmm\"` | Tokenization engine. [See options](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.word_tokenize) |\n| `sent` | `bool` | `True` | Enable/disable sentence segmentation |\n| `sent_engine` | `str` | `\"crfcut\"` | Sentence tokenizer engine. [See options](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.sent_tokenize) |\n| `pos` | `bool` | `True` | Enable/disable part-of-speech tagging |\n| `pos_engine` | `str` | `\"perceptron\"` | POS tagging engine. [See options](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.pos_tag) |\n| `pos_corpus` | `str` | `\"orchid_ud\"` | Corpus for POS tagging |\n| `ner` | `bool` | `True` | Enable/disable named entity recognition |\n| `ner_engine` | `str` | `\"thainer\"` | NER engine. [See options](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.NER) |\n| `dependency_parsing` | `bool` | `False` | Enable/disable dependency parsing |\n| `dependency_parsing_engine` | `str` | `\"esupar\"` | Dependency parsing engine. [See options](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing) |\n| `dependency_parsing_model` | `str` | `None` | Dependency parsing model. [See options](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing) |\n| `word_vector` | `bool` | `True` | Enable/disable word vectors |\n| `word_vector_model` | `str` | `\"thai2fit_wv\"` | Word vector model. [See options](https://pythainlp.github.io/docs/3.1/api/word_vector.html#pythainlp.word_vector.WordVector) |\n\n**Important Notes:**\n- When `dependency_parsing` is enabled, word segmentation and sentence segmentation are automatically disabled to use the tokenization from the dependency parser.\n- All configuration options are optional and have sensible defaults.\n\n## Resources\n\n- [PyThaiNLP Documentation](https://pythainlp.github.io/)\n- [spaCy Documentation](https://spacy.io/)\n- [GitHub Repository](https://github.com/PyThaiNLP/spaCy-PyThaiNLP)\n- [Issue Tracker](https://github.com/PyThaiNLP/spaCy-PyThaiNLP/issues)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n## License\n\n```\n   Copyright 2016-2026 PyThaiNLP Project\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpythainlp%2Fspacy-pythainlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpythainlp%2Fspacy-pythainlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpythainlp%2Fspacy-pythainlp/lists"}