{"id":34082712,"url":"https://github.com/wwwcojp/ja_sentence_segmenter","last_synced_at":"2026-04-07T15:31:28.420Z","repository":{"id":62572175,"uuid":"228192114","full_name":"wwwcojp/ja_sentence_segmenter","owner":"wwwcojp","description":"japanese sentence segmentation library for python","archived":false,"fork":false,"pushed_at":"2023-04-03T04:10:03.000Z","size":160,"stargazers_count":73,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-16T14:45:46.291Z","etag":null,"topics":["nlp","python","rule-based","sentence-boundary-detection","sentence-tokenizer"],"latest_commit_sha":null,"homepage":"https://wwwcojp.github.io/ja_sentence_segmenter/ja_sentence_segmenter.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wwwcojp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-15T13:51:31.000Z","updated_at":"2025-09-03T08:57:04.000Z","dependencies_parsed_at":"2024-06-21T16:53:11.931Z","dependency_job_id":"3b7c596d-84f1-43b5-9d10-592691b876da","html_url":"https://github.com/wwwcojp/ja_sentence_segmenter","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"0694a4653de95a3dfe6e177aef78c0c208a3961d"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/wwwcojp/ja_sentence_segmenter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wwwcojp%2Fja_sentence_segmenter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wwwcojp%2Fja_sentence_segmenter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wwwcojp%2Fja_sentence_segmenter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wwwcojp%2Fja_sentence_segmenter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wwwcojp","download_url":"https://codeload.github.com/wwwcojp/ja_sentence_segmenter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wwwcojp%2Fja_sentence_segmenter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31518419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","python","rule-based","sentence-boundary-detection","sentence-tokenizer"],"created_at":"2025-12-14T12:24:00.868Z","updated_at":"2026-04-07T15:31:28.403Z","avatar_url":"https://github.com/wwwcojp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ja_sentence_segmenter\n日本語のテキストに対して、ルールベースによる文区切り（sentence segmentation）を行います。\n\n## Getting Started\n\n### Prerequisites\n* Python 3.6+\n\n### Installing\n`pip install ja_sentence_segmenter`\n\n### Usage\n```Python\nimport functools\n\nfrom ja_sentence_segmenter.common.pipeline import make_pipeline\nfrom ja_sentence_segmenter.concatenate.simple_concatenator import concatenate_matching\nfrom ja_sentence_segmenter.normalize.neologd_normalizer import normalize\nfrom ja_sentence_segmenter.split.simple_splitter import split_newline, split_punctuation\n\nsplit_punc2 = functools.partial(split_punctuation, punctuations=r\"。!?\")\nconcat_tail_no = functools.partial(concatenate_matching, former_matching_rule=r\"^(?P\u003cresult\u003e.+)(の)$\", remove_former_matched=False)\nsegmenter = make_pipeline(normalize, split_newline, concat_tail_no, split_punc2)\n\n# Golden Rule: Simple period to end sentence #001 (from https://github.com/diasks2/pragmatic_segmenter/blob/master/spec/pragmatic_segmenter/languages/japanese_spec.rb#L6)\ntext1 = \"これはペンです。それはマーカーです。\"\nprint(list(segmenter(text1)))\n```\n\n```\n\u003e [\"これはペンです。\", \"それはマーカーです。\"]\n```\n\n## Versioning\nWe use SemVer for versioning. For the versions available, see the tags on this repository.\n\n## Contributing\nTODO\n\n## License\nMIT\n\n## Acknowledgments\n\n### テキストの正規化処理\nテキスト正規化のコードは、[mecab-ipadic-NEologd](https://github.com/neologd/mecab-ipadic-neologd)の以下のWIKIを参考に一部修正を加えています。\n\nサンプルコードの提供者であるhideaki-t氏とoverlast氏に感謝します。\n\nhttps://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja#python-written-by-hideaki-t--overlast\n\n### 文区切り（sentence segmentation）のルール\n文区切りのルールとして、[Pragmatic Segmenter](https://github.com/diasks2/pragmatic_segmenter)の日本語ルールを参考にしました。\n\nhttps://github.com/diasks2/pragmatic_segmenter#golden-rules-japanese\n\nまた、以下のテストコード中で用いられているテストデータを、本PJのテストコードで利用しました。\n\nhttps://github.com/diasks2/pragmatic_segmenter/blob/master/spec/pragmatic_segmenter/languages/japanese_spec.rb\n\n作者のKevin S. Dias氏と[コントリビュータの方々](https://github.com/diasks2/pragmatic_segmenter/graphs/contributors)に感謝します。\n\nThanks to Kevin S. Dias and [contributors](https://github.com/diasks2/pragmatic_segmenter/graphs/contributors).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwwwcojp%2Fja_sentence_segmenter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwwwcojp%2Fja_sentence_segmenter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwwwcojp%2Fja_sentence_segmenter/lists"}