{"id":20106177,"url":"https://github.com/himkt/konoha","last_synced_at":"2026-03-01T11:08:47.874Z","repository":{"id":39697216,"uuid":"145716955","full_name":"himkt/konoha","owner":"himkt","description":"🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.","archived":false,"fork":false,"pushed_at":"2025-04-29T02:31:55.000Z","size":1415,"stargazers_count":261,"open_issues_count":0,"forks_count":26,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-02-21T23:57:47.384Z","etag":null,"topics":["janome","japanese","kytea","mecab","natural-language-processing","nlp","sentencepiece","sudachi","text-processing"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/konoha","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/himkt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-08-22T14:00:15.000Z","updated_at":"2026-01-13T19:50:57.000Z","dependencies_parsed_at":"2024-05-12T12:40:41.461Z","dependency_job_id":"c3dceec5-c3bb-47d0-a896-65344e251b3b","html_url":"https://github.com/himkt/konoha","commit_stats":{"total_commits":322,"total_committers":13,"mean_commits":24.76923076923077,"dds":0.4565217391304348,"last_synced_commit":"fa5ff203eac878fc0ed5fef873f98749a23361b9"},"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"purl":"pkg:github/himkt/konoha","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/himkt%2Fkonoha","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/himkt%2Fkonoha/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/himkt%2Fkonoha/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/himkt%2Fkonoha/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/himkt","download_url":"https://codeload.github.com/himkt/konoha/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/himkt%2Fkonoha/sbom","scorecard":{"id":464932,"data":{"date":"2025-08-11","repo":{"name":"github.com/himkt/konoha","commit":"d37daf5b0a07f8b2bcf53f07e7afb3f03358c8a5"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.2,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 2/29 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/ci.yml:1","Warn: no topLevel permission defined: .github/workflows/docker.yml:1","Warn: no topLevel permission defined: .github/workflows/publish.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Packaging","score":10,"reason":"packaging workflow detected","details":["Info: Project packages its releases by way of GitHub Actions.: .github/workflows/docker.yml:10"],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/ci.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/docker.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/docker.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docker.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/docker.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docker.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/docker.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docker.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/docker.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/publish.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/publish.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/himkt/konoha/publish.yml/main?enable=pin","Warn: containerImage not pinned by hash: Dockerfile:1: pin your Docker image by updating ubuntu:22.04 to ubuntu:22.04@sha256:1aa979d85661c488ce030ac292876cf6ed04535d3a237e49f61542d8e5de5ae0","Info:   0 out of   3 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   5 third-party GitHubAction dependencies pinned","Info:   0 out of   1 containerImage dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 29 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T12:13:49.987Z","repository_id":39697216,"created_at":"2025-08-19T12:13:49.987Z","updated_at":"2025-08-19T12:13:49.987Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29967947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T10:55:55.490Z","status":"ssl_error","status_checked_at":"2026-03-01T10:55:55.175Z","response_time":124,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["janome","japanese","kytea","mecab","natural-language-processing","nlp","sentencepiece","sudachi","text-processing"],"created_at":"2024-11-13T17:49:17.250Z","updated_at":"2026-03-01T11:08:47.868Z","avatar_url":"https://github.com/himkt.png","language":"Python","readme":"# 🌿 Konoha: Simple wrapper of Japanese Tokenizers\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/himkt/konoha/blob/main/example/Konoha_Example.ipynb)\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/5164000/120913279-e7d62380-c6d0-11eb-8d17-6571277cdf27.gif\" width=\"95%\"\u003e\u003c/p\u003e\n\n[![GitHub stars](https://img.shields.io/github/stars/himkt/konoha?style=social)](https://github.com/himkt/konoha/stargazers)\n\n[![Downloads](https://pepy.tech/badge/konoha)](https://pepy.tech/project/konoha)\n[![Downloads](https://pepy.tech/badge/konoha/month)](https://pepy.tech/project/konoha/month)\n[![Downloads](https://pepy.tech/badge/konoha/week)](https://pepy.tech/project/konoha/week)\n\n[![Build Status](https://github.com/himkt/konoha/actions/workflows/ci.yml/badge.svg)](https://github.com/himkt/konoha/actions/workflows/ci.yml)\n[![Documentation Status](https://readthedocs.org/projects/konoha/badge/?version=latest)](https://konoha.readthedocs.io/en/latest/?badge=latest)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/konoha)\n[![PyPI](https://img.shields.io/pypi/v/konoha.svg)](https://pypi.python.org/pypi/konoha)\n[![GitHub Issues](https://img.shields.io/github/issues/himkt/konoha.svg?cacheSeconds=60\u0026color=yellow)](https://github.com/himkt/konoha/issues)\n[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/himkt/konoha.svg?cacheSeconds=60\u0026color=yellow)](https://github.com/himkt/konoha/issues)\n\n`Konoha` is a Python library for providing easy-to-use integrated interface of various Japanese tokenizers,\nwhich enables you to switch a tokenizer and boost your pre-processing.\n\n## Supported tokenizers\n\n\u003ca href=\"https://github.com/buruzaemon/natto-py\"\u003e\u003cimg src=\"https://img.shields.io/badge/MeCab-natto--py-ff69b4\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/mocobeta/janome\"\u003e\u003cimg src=\"https://img.shields.io/badge/Janome-janome-ff69b4\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/WorksApplications/SudachiPy\"\u003e\u003cimg src=\"https://img.shields.io/badge/Sudachi-sudachipy-ff69b4\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/google/sentencepiece\"\u003e\u003cimg src=\"https://img.shields.io/badge/Sentencepiece-sentencepiece-ff69b4\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/taishi-i/nagisa\"\u003e\u003cimg src=\"https://img.shields.io/badge/nagisa-nagisa-ff69b4\"\u003e\u003c/a\u003e\n\nAlso, `konoha` provides rule-based tokenizers (whitespace, character) and a rule-based sentence splitter.\n\n\n## Quick Start with Docker\n\nSimply run followings on your computer:\n\n```bash\ndocker run --rm -p 8000:8000 -t himkt/konoha  # from DockerHub\n```\n\nOr you can build image on your machine:\n\n```bash\ngit clone https://github.com/himkt/konoha  # download konoha\ncd konoha \u0026\u0026 docker-compose up --build  # build and launch container\n```\n\nTokenization is done by posting a json object to `localhost:8000/api/v1/tokenize`.\nYou can also batch tokenize by passing `texts: [\"１つ目の入力\", \"２つ目の入力\"]` to `localhost:8000/api/v1/batch_tokenize`.\n\n(API documentation is available on `localhost:8000/redoc`, you can check it using your web browser)\n\nSend a request using `curl` on your terminal.\nNote that a path to an endpoint is changed in v4.6.4.\nPlease check our release note (https://github.com/himkt/konoha/releases/tag/v4.6.4).\n\n```json\n$ curl localhost:8000/api/v1/tokenize -X POST -H \"Content-Type: application/json\" \\\n    -d '{\"tokenizer\": \"mecab\", \"text\": \"これはペンです\"}'\n\n{\n  \"tokens\": [\n    [\n      {\n        \"surface\": \"これ\",\n        \"part_of_speech\": \"名詞\"\n      },\n      {\n        \"surface\": \"は\",\n        \"part_of_speech\": \"助詞\"\n      },\n      {\n        \"surface\": \"ペン\",\n        \"part_of_speech\": \"名詞\"\n      },\n      {\n        \"surface\": \"です\",\n        \"part_of_speech\": \"助動詞\"\n      }\n    ]\n  ]\n}\n```\n\n\n## Installation\n\n\nI recommend you to install konoha by `pip install 'konoha[all]'`.\n\n- Install konoha with a specific tokenizer: `pip install 'konoha[(tokenizer_name)]`.\n- Install konoha with a specific tokenizer and remote file support: `pip install 'konoha[(tokenizer_name),remote]'`\n\nIf you want to install konoha with a tokenizer, please install konoha with a specific tokenizer\n(e.g. `konoha[mecab]`, `konoha[sudachi]`, ...etc) or install tokenizers individually.\n\n\n## Example\n\n### Word level tokenization\n\n```python\nfrom konoha import WordTokenizer\n\nsentence = '自然言語処理を勉強しています'\n\ntokenizer = WordTokenizer('MeCab')\nprint(tokenizer.tokenize(sentence))\n# =\u003e [自然, 言語, 処理, を, 勉強, し, て, い, ます]\n\ntokenizer = WordTokenizer('Sentencepiece', model_path=\"data/model.spm\")\nprint(tokenizer.tokenize(sentence))\n# =\u003e [▁, 自然, 言語, 処理, を, 勉強, し, ています]\n```\n\nFor more detail, please see the `example/` directory.\n\n### Remote files\n\nKonoha supports dictionary and model on cloud storage (currently supports Amazon S3).\nIt requires installing konoha with the `remote` option, see [Installation](#installation).\n\n```python\n# download user dictionary from S3\nword_tokenizer = WordTokenizer(\"mecab\", user_dictionary_path=\"s3://abc/xxx.dic\")\nprint(word_tokenizer.tokenize(sentence))\n\n# download system dictionary from S3\nword_tokenizer = WordTokenizer(\"mecab\", system_dictionary_path=\"s3://abc/yyy\")\nprint(word_tokenizer.tokenize(sentence))\n\n# download model file from S3\nword_tokenizer = WordTokenizer(\"sentencepiece\", model_path=\"s3://abc/zzz.model\")\nprint(word_tokenizer.tokenize(sentence))\n```\n\n### Sentence level tokenization\n\n```python\nfrom konoha import SentenceTokenizer\n\nsentence = \"私は猫だ。名前なんてものはない。だが，「かわいい。それで十分だろう」。\"\n\ntokenizer = SentenceTokenizer()\nprint(tokenizer.tokenize(sentence))\n# =\u003e ['私は猫だ。', '名前なんてものはない。', 'だが，「かわいい。それで十分だろう」。']\n```\n\nYou can change symbols for a sentence splitter and bracket expression.\n\n1. sentence splitter\n\n```python\nsentence = \"私は猫だ。名前なんてものはない．だが，「かわいい。それで十分だろう」。\"\n\ntokenizer = SentenceTokenizer(period=\"．\")\nprint(tokenizer.tokenize(sentence))\n# =\u003e ['私は猫だ。名前なんてものはない．', 'だが，「かわいい。それで十分だろう」。']\n```\n\n2. bracket expression\n\n```python\nsentence = \"私は猫だ。名前なんてものはない。だが，『かわいい。それで十分だろう』。\"\n\ntokenizer = SentenceTokenizer(\n    patterns=SentenceTokenizer.PATTERNS + [re.compile(r\"『.*?』\")],\n)\nprint(tokenizer.tokenize(sentence))\n# =\u003e ['私は猫だ。', '名前なんてものはない。', 'だが，『かわいい。それで十分だろう』。']\n```\n\n\n## Test\n\n```\npython -m pytest\n```\n\n## Article\n\n- [トークナイザをいい感じに切り替えるライブラリ konoha を作った](https://qiita.com/klis/items/bb9ffa4d9c886af0f531)\n- [日本語解析ツール Konoha に AllenNLP 連携機能を実装した](https://qiita.com/klis/items/f1d29cb431d1bf879898)\n\n## Acknowledgement\n\nSentencepiece model used in test is provided by @yoheikikuta. Thanks!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhimkt%2Fkonoha","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhimkt%2Fkonoha","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhimkt%2Fkonoha/lists"}