An open API service indexing awesome lists of open source software.

https://github.com/ikegami-yukino/sengiri

Yet another sentence-level tokenizer for the Japanese text
https://github.com/ikegami-yukino/sengiri

japanese-language japanese-sentences sentence-tokenizer tokenizer

Last synced: 3 months ago
JSON representation

Yet another sentence-level tokenizer for the Japanese text

Awesome Lists containing this project

README

        

sengiri
==========
|travis| |coveralls| |pyversion| |version| |license|

Yet another sentence-level tokenizer for the Japanese text

DEPENDENCIES
==============

- MeCab
- emoji

INSTALLATION
==============

::

$ pip install sengiri

USAGE
============

.. code:: python

import sengiri

print(sengiri.tokenize('ć†ćƒ¼ć‚“šŸ¤”šŸ¤”šŸ¤”ć©ć†ć—ć‚ˆć†'))
#=>['ć†ćƒ¼ć‚“šŸ¤”šŸ¤”šŸ¤”', 'ć©ć†ć—ć‚ˆć†']
print(sengiri.tokenize('ćƒ¢ćƒ¼åØ˜ć€‚ć®ć‚³ćƒ³ć‚µćƒ¼ćƒˆć«č”Œć£ćŸć€‚'))
#=>['ćƒ¢ćƒ¼åØ˜ć€‚ć®ć‚³ćƒ³ć‚µćƒ¼ćƒˆć«č”Œć£ćŸć€‚']
print(sengiri.tokenize('ć‚ć‚ŠćŒćØć†ļ¼¾ļ¼¾ åŠ©ć‹ć‚Šć¾ć™ć€‚'))
#=>['ć‚ć‚ŠćŒćØć†ļ¼¾ļ¼¾', 'åŠ©ć‹ć‚Šć¾ć™ć€‚']
print(sengiri.tokenize('é””ę–‡å­—ćƒ†ć‚¹ćƒˆ(*“ω`*)ć†ć¾ćć„ćć‹ćŖļ¼Ÿ'))
#=>['é””ę–‡å­—ćƒ†ć‚¹ćƒˆ(*“ω`*)ć†ć¾ćć„ćć‹ćŖļ¼Ÿ']
# I recommend using the NEologd dictionary.
print(sengiri.tokenize('é””ę–‡å­—ćƒ†ć‚¹ćƒˆ(*“ω`*)ć†ć¾ćć„ćć‹ćŖļ¼Ÿ', mecab_args='-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd'))
#=>['é””ę–‡å­—ćƒ†ć‚¹ćƒˆ(*“ω`*)', 'ć†ć¾ćć„ćć‹ćŖļ¼Ÿ']
print(sengiri.tokenize('å­ä¾›ćŒå¤§å¤‰ćŖć“ćØć«ćŖć£ćŸć€‚'
'ļ¼ˆå¾Œć§čžć„ćŸć®ć ćŒć€č„…ć•ć‚ŒćŸć‚‰ć—ć„ļ¼‰'
'ļ¼ˆč„…čæ«ćÆć‚„ć‚ć¦ć»ć—ć„ćØčØ€ć£ć¦ć„ć‚‹ć®ć«ļ¼‰'))
#=>['å­ä¾›ćŒå¤§å¤‰ćŖć“ćØć«ćŖć£ćŸć€‚', 'ļ¼ˆå¾Œć§čžć„ćŸć®ć ćŒć€č„…ć•ć‚ŒćŸć‚‰ć—ć„ļ¼‰', 'ļ¼ˆč„…čæ«ćÆć‚„ć‚ć¦ć»ć—ć„ćØčØ€ć£ć¦ć„ć‚‹ć®ć«ļ¼‰']
print(sengiri.tokenize('ę„½ć—ć‹ć£ćŸw また遊ぼwww'))
#=>['ę„½ć—ć‹ć£ćŸw', 'また遊ぼwww']
print(sengiri.tokenize('http://www.inpaku.go.jp/'))
#=>['http://www.inpaku.go.jp/']

.. |travis| image:: https://travis-ci.org/ikegami-yukino/sengiri.svg?branch=master
:target: https://travis-ci.org/ikegami-yukino/sengiri
:alt: travis-ci.org

.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/sengiri/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/ikegami-yukino/sengiri?branch=master
:alt: coveralls.io

.. |pyversion| image:: https://img.shields.io/pypi/pyversions/sengiri.svg

.. |version| image:: https://img.shields.io/pypi/v/sengiri.svg
:target: http://pypi.python.org/pypi/sengiri/
:alt: latest version

.. |license| image:: https://img.shields.io/pypi/l/sengiri.svg
:target: http://pypi.python.org/pypi/sengiri/
:alt: license