https://github.com/34j/mecab-text-cleaner
Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.
https://github.com/34j/mecab-text-cleaner
accent accents cleaner fugashi hiragana japanaese japanese-dictionary kana kanji mecab python text-cleaner text-to-speech tokenizer tts unidic yomigana
Last synced: about 2 months ago
JSON representation
Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.
- Host: GitHub
- URL: https://github.com/34j/mecab-text-cleaner
- Owner: 34j
- License: mit
- Created: 2023-09-01T07:18:43.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-21T17:08:49.000Z (7 months ago)
- Last Synced: 2024-10-22T06:41:26.448Z (7 months ago)
- Topics: accent, accents, cleaner, fugashi, hiragana, japanaese, japanese-dictionary, kana, kanji, mecab, python, text-cleaner, text-to-speech, tokenizer, tts, unidic, yomigana
- Language: Python
- Homepage:
- Size: 126 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
Awesome Lists containing this project
README
# MeCab Text Cleaner
This is a simple Python package for getting japanese readings (yomigana) and accents using MeCab.
Please also consider using [pyopenjtalk](https://github.com/r9y9/pyopenjtalk) (no accents) or [pyopenjtalk_g2p_prosody (ESPnet)](https://github.com/espnet/espnet/blob/5d0758e2a7063b82d1f10a8ac2de98eb6cf8a352/espnet2/text/phoneme_tokenizer.py#L103) (with accents), as this package does not account for accent changes in compound words.## Installation
Install this via pip or pipx (or your favourite package manager):
```shell
pipx install mecab-text-cleaner[unidecode,unidic]
``````shell
pip install mecab-text-cleaner[unidecode,unidic]
```## Usage
```shell
> mtc いい天気ですね。
イ]ー テ]ンキ デス ネ。
> mtc いい天気ですね。 --ascii
i] te]nki desu ne.
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words
イーテンキデスネ
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words -r kana
イイテンキデスネ
``````python
from mecab_text_cleaner import to_reading, to_ascii_cleanassert to_reading(" 空、雲。\n雨!(") == "ソ]ラ、 ク]モ。\nア]メ!("
assert to_ascii_clean(" 한空、雲。\n雨!(") == "han so]ra, ku]mo. \na]me!("
```## Contributors ✨
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!