https://github.com/ikegami-yukino/sengiri
Yet another sentence-level tokenizer for the Japanese text
https://github.com/ikegami-yukino/sengiri
japanese-language japanese-sentences sentence-tokenizer tokenizer
Last synced: 3 months ago
JSON representation
Yet another sentence-level tokenizer for the Japanese text
- Host: GitHub
- URL: https://github.com/ikegami-yukino/sengiri
- Owner: ikegami-yukino
- License: mit
- Created: 2019-10-04T18:46:43.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-09-27T02:24:04.000Z (over 2 years ago)
- Last Synced: 2024-10-13T09:49:51.346Z (8 months ago)
- Topics: japanese-language, japanese-sentences, sentence-tokenizer, tokenizer
- Language: Python
- Size: 20.5 KB
- Stars: 21
- Watchers: 4
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE
Awesome Lists containing this project
README
sengiri
==========
|travis| |coveralls| |pyversion| |version| |license|Yet another sentence-level tokenizer for the Japanese text
DEPENDENCIES
==============- MeCab
- emojiINSTALLATION
==============::
$ pip install sengiri
USAGE
============.. code:: python
import sengiri
print(sengiri.tokenize('ćć¼ćš¤š¤š¤ć©ćććć'))
#=>['ćć¼ćš¤š¤š¤', 'ć©ćććć']
print(sengiri.tokenize('ć¢ć¼åØćć®ć³ć³ćµć¼ćć«č”ć£ćć'))
#=>['ć¢ć¼åØćć®ć³ć³ćµć¼ćć«č”ć£ćć']
print(sengiri.tokenize('ććććØć^^ å©ććć¾ćć'))
#=>['ććććØć^^', 'å©ććć¾ćć']
print(sengiri.tokenize('é”ęåćć¹ć(*Ā“Ļļ½*)ćć¾ćććććŖļ¼'))
#=>['é”ęåćć¹ć(*Ā“Ļļ½*)ćć¾ćććććŖļ¼']
# I recommend using the NEologd dictionary.
print(sengiri.tokenize('é”ęåćć¹ć(*Ā“Ļļ½*)ćć¾ćććććŖļ¼', mecab_args='-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd'))
#=>['é”ęåćć¹ć(*Ā“Ļļ½*)', 'ćć¾ćććććŖļ¼']
print(sengiri.tokenize('åä¾ć大å¤ćŖććØć«ćŖć£ćć'
'ļ¼å¾ć§čććć®ć ććč ććććććļ¼'
'ļ¼č čæ«ćÆććć¦ć»ćććØčØć£ć¦ććć®ć«ļ¼'))
#=>['åä¾ć大å¤ćŖććØć«ćŖć£ćć', 'ļ¼å¾ć§čććć®ć ććč ććććććļ¼', 'ļ¼č čæ«ćÆććć¦ć»ćććØčØć£ć¦ććć®ć«ļ¼']
print(sengiri.tokenize('愽ććć£ćw ć¾ćéć¼www'))
#=>['愽ććć£ćw', 'ć¾ćéć¼www']
print(sengiri.tokenize('http://www.inpaku.go.jp/'))
#=>['http://www.inpaku.go.jp/'].. |travis| image:: https://travis-ci.org/ikegami-yukino/sengiri.svg?branch=master
:target: https://travis-ci.org/ikegami-yukino/sengiri
:alt: travis-ci.org.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/sengiri/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/ikegami-yukino/sengiri?branch=master
:alt: coveralls.io.. |pyversion| image:: https://img.shields.io/pypi/pyversions/sengiri.svg
.. |version| image:: https://img.shields.io/pypi/v/sengiri.svg
:target: http://pypi.python.org/pypi/sengiri/
:alt: latest version.. |license| image:: https://img.shields.io/pypi/l/sengiri.svg
:target: http://pypi.python.org/pypi/sengiri/
:alt: license