https://github.com/jeongukjae/mecab-bind
Binding MeCab Tagger to Python3 and TensorFlow
https://github.com/jeongukjae/mecab-bind
python-binding tensorflow-binding
Last synced: 6 months ago
JSON representation
Binding MeCab Tagger to Python3 and TensorFlow
- Host: GitHub
- URL: https://github.com/jeongukjae/mecab-bind
- Owner: jeongukjae
- License: gpl-3.0
- Created: 2021-05-14T15:58:00.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-05-25T14:26:40.000Z (over 4 years ago)
- Last Synced: 2025-03-27T03:41:42.472Z (7 months ago)
- Topics: python-binding, tensorflow-binding
- Language: C++
- Homepage:
- Size: 2.37 MB
- Stars: 11
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# mecab-bind
* [mecab-bind](https://pypi.org/project/mecab-bind/):  
* [mecab-tf](https://pypi.org/project/mecab-tf/):  [](https://github.com/jeongukjae/mecab-bind/actions/workflows/build-and-test.yml)
Binding MeCab Tagger to python and tensorflow
## Installation
* Python binding: `pip install mecab-bind`
* TensorFlow binding: `pip install mecab-tf`### Compatible TensorFlow version
|mecab-tf|tensorflow version|python version|
|---|---|---|
|2.4.0|2.4.x|3.6, 3.7, 3.8|
|2.5.0|2.5.x|3.6, 3.7, 3.8, 3.9|## Usage
### Python Binding
```python
>>> import mecab
>>> tagger = mecab.Tagger(mecab.get_model_args("./test-data/dic")) # pass dictionary path instead of "./test-data/dic"
>>> dic_infos = tagger.get_dictionary_info()
>>> tagger.get_dictionary_info()
[]
>>> tagger.parse_node_with_lattice("シリーズ中、カンフーシーンが一番多い。")
[
,
,
,
,
,
,
,
,
,
]
>>> tagger.parse_nbest_with_lattice("シリーズ中、カンフーシーンが一番多い。", 10)
[
[
,
,
,
...
],
[
,
,
,
,
...
],
...
]
>>> print(tagger.parse("シリーズ中、カンフーシーンが一番多い。"))
シリーズ 名詞,一般,*,*,*,*,*
中 接頭詞,数接続,*,*,*,*,中,ナカ,ナカ
、 記号,読点,*,*,*,*,、,、,、
カンフーシーン 名詞,一般,*,*,*,*,*
が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
一番 名詞,副詞可能,*,*,*,*,一番,イチバン,イチバン
多い 形容詞,自立,*,*,形容詞・アウオ段,基本形,多い,オオイ,オーイ
。 記号,句点,*,*,*,*,。,。,。
EOS```
Bound commands
* `mecab-dict-index`
* `mecab-dict-gen`
* `mecab-system-eval`
* `mecab-cost-train`
* `mecab-test-gen`
* `mecab`### TensorFlow Binding
```python
>>> import tensorflow as tf
>>> from mecab_tf.python.ops.mecab_ops import MecabTagger
>>> tagger = MecabTagger("./test-data/dic")
2021-05-20 05:35:48.759933: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> surfaces, features = tagger.tag(["シリーズ中、カンフーシーンが一番多い。", "※撮影中に、ジェット・リーが失踪。"])
>>> surfaces.shape
TensorShape([2, None])
>>> features.shape
TensorShape([2, None])
>>> for surface, feature in zip(surfaces[0], features[0]): # print first sentence
... print(surface.numpy().decode('utf8'), feature.numpy().decode('utf8'))
...
BOS/EOS,*,*,*,*,*,*,*,*
シリーズ 名詞,一般,*,*,*,*,*
中 接頭詞,数接続,*,*,*,*,中,ナカ,ナカ
、 記号,読点,*,*,*,*,、,、,、
カンフーシーン 名詞,一般,*,*,*,*,*
が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
一番 名詞,副詞可能,*,*,*,*,一番,イチバン,イチバン
多い 形容詞,自立,*,*,形容詞・アウオ段,基本形,多い,オオイ,オーイ
。 記号,句点,*,*,*,*,。,。,。
BOS/EOS,*,*,*,*,*,*,*,*
>>> # you can pass any shape of string tensor
>>> _ = tagger.tag("シリーズ中、カンフーシーンが一番多い。")
>>> _ = tagger.tag([["シリーズ中、カンフーシーンが一番多い。", "※撮影中に、ジェット・リーが失踪。"]])
```Note: If you use this Module in SavedModel format, it is recommended to use model_path as absolute path.
The `model_path` is serialized, not the dictionary data.## Prebuilt dictionaries
* [prebuilt mecab-ko-dic (korean)](https://github.com/jeongukjae/mecab-ko-dic-prebuilt) and [example notebook](https://github.com/jeongukjae/mecab-ko-dic-prebuilt/blob/main/example-of-mecab-ko-dic-prebuilt--and-mecab-tf.ipynb)