Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seanghay/khmerphonemizer
A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.
https://github.com/seanghay/khmerphonemizer
cambodia khmer khmer-language nlp phonemes
Last synced: 2 days ago
JSON representation
A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.
- Host: GitHub
- URL: https://github.com/seanghay/khmerphonemizer
- Owner: seanghay
- License: mit
- Created: 2023-09-10T04:27:30.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-19T09:38:51.000Z (11 months ago)
- Last Synced: 2024-10-03T14:37:50.188Z (about 1 month ago)
- Topics: cambodia, khmer, khmer-language, nlp, phonemes
- Language: Python
- Homepage: https://pypi.org/project/khmerphonemizer/
- Size: 21.7 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-khmer-language - khmerphonemizer
README
## Khmer Phonemizer
A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.
[[Colab]](https://colab.research.google.com/drive/1GHTjbMF52ijJuK0HdTU1LzkkQ6sK535e?usp=sharing)
### Installation
```shell
pip install khmerphonemizer
```### Usage
```python
from khmerphonemizer import phonemizetext = "នៅលើលោកនេះមិនមានមនុស្សណាម្នាក់ចេះអស់ទេ"
result = phonemize(text)print(result)
```Output
```python
(['នៅលើ',
'លោក',
'នេះ',
'មិន',
'មាន',
'មនុស្ស',
'ណា',
'ម្នាក់',
'ចេះ',
'អស់',
'ទេ'],
[['n', 'ɨ', 'w', 'l', 'əː'],
['l', 'oː', 'k'],
['n', 'i', 'h'],
['m', 'ɨ', 'n'],
['m', 'i', 'ə', 'n'],
['m', 'ɔ', 'n', 'u', 'h'],
['n', 'aː'],
['m', 'n', 'ĕ', 'ə', 'ʔ'],
['c', 'e', 'h'],
['ʔ', 'ɑ', 'h'],
['t', 'eː']])
```Check out the [examples/](./examples/) for more examples.
### API
- `phonemize` Tokenize input text into words and phonemize each word and returns a tuple with tokens and phonemes.
- `input_str: str` Text with multiple words.
- `beam: int = 500` number of beam search.
- `min_beam: int = 100`: minimum number of beam search.
- `beam_score: float = 0.6` beam search score.
- `use_lexicon: bool = True` Use lexicon dictionary for known words.- `phonemize_single` Phonemize a single word.
- `word: str` Text with single Khmer or English word only.
- `beam: int = 500` number of beam search.
- `min_beam: int = 100`: minimum number of beam search.
- `beam_score: float = 0.6` beam search score.
- `use_lexicon: bool = True` Use lexicon dictionary for known words.### License
MIT
---
### References
Without these awesome projects from awesome people, this wouldn't be possible.
- [Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search](https://arxiv.org/abs/2112.08918) (Rina Buoy and Nguonly Taing and Sovisal Chenda)
- [CUNY-CL/wikipron](https://github.com/CUNY-CL/wikipron/) (Kyle Gorman, Jackson Lee, and contributors, 2019)
- [rhasspy/gruut](https://github.com/rhasspy/gruut) (Michael Hansen et al., 2020)
- [OpenFst](https://www.openfst.org/) (Kyle Gorman et al.)
- [AdolfVonKleist/Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus) (Josef Novak et al., 2017)### Related
- [khmercut](https://github.com/seanghay/khmercut)
- [khmernormalizer](https://github.com/seanghay/khmernormalizer)
- [khmer-latin-name-transformer](https://github.com/seanghay/khmer-latin-name-transformer)
- [awesome-khmer-language](https://github.com/seanghay/awesome-khmer-language)