Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mocobeta/janome
Japanese morphological analysis engine written in pure Python
https://github.com/mocobeta/janome
japanese-language nlp-library python
Last synced: 27 days ago
JSON representation
Japanese morphological analysis engine written in pure Python
- Host: GitHub
- URL: https://github.com/mocobeta/janome
- Owner: mocobeta
- License: apache-2.0
- Created: 2015-02-14T09:47:00.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2023-07-01T11:31:28.000Z (over 1 year ago)
- Last Synced: 2024-10-02T16:22:01.866Z (about 1 month ago)
- Topics: japanese-language, nlp-library, python
- Language: Python
- Homepage: https://mocobeta.github.io/janome/en/
- Size: 403 MB
- Stars: 852
- Watchers: 32
- Forks: 51
- Open Issues: 15
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.txt
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
Awesome Lists containing this project
README
=======
Janome
=======.. image:: https://github.com/mocobeta/janome/workflows/Checks/badge.svg
:target: https://github.com/mocobeta/janome/actions?query=workflow%3AChecks.. image:: https://coveralls.io/repos/github/mocobeta/janome/badge.svg?branch=master
:target: https://coveralls.io/github/mocobeta/janome?branch=master.. image:: https://badges.gitter.im/org.png
:target: https://gitter.im/janome-python.. image:: https://img.shields.io/pypi/dm/Janome.svg
:target: https://pypistats.org/packages/janome.. image:: https://img.shields.io/conda/v/conda-forge/janome
:target: https://anaconda.org/conda-forge/janomeJanome is a Japanese morphological analysis engine written in pure Python.
General documentation:
https://mocobeta.github.io/janome/en/ (English)
https://mocobeta.github.io/janome/ (Japanese)
Requirements
=============Python 3.7+ is required.
Install
========[Note] This consumes about 500 MB memory for building.
.. code:: bash
(venv) $ pip install janome
Run
====.. code:: bash
(venv) $ python
>>> from janome.tokenizer import Tokenizer
>>> t = Tokenizer()
>>> for token in t.tokenize('すもももももももものうち'):
... print(token)
...
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチLicense
========Licensed under Apache License 2.0 and uses the MeCab-IPADIC dictionary/statistical model.
See LICENSE.txt and NOTICE.txt for license details.
Acknowledgement
================Special thanks to @ikawaha, @takuyaa, @nakagami and @janome_oekaki.
Copyright
==========Copyright(C) 2015-2023, Tomoko Uchida. All rights reserved.