https://github.com/tigregotico/phoneme_guesser
Utility to retrieve phonemes from text
https://github.com/tigregotico/phoneme_guesser
languages phoneme-guesser phoneme-prediction phonemes retrieve-phonemes
Last synced: 2 months ago
JSON representation
Utility to retrieve phonemes from text
- Host: GitHub
- URL: https://github.com/tigregotico/phoneme_guesser
- Owner: TigreGotico
- License: apache-2.0
- Created: 2020-12-20T14:02:36.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-02-15T20:29:17.000Z (over 3 years ago)
- Last Synced: 2025-06-26T02:42:41.875Z (4 months ago)
- Topics: languages, phoneme-guesser, phoneme-prediction, phonemes, retrieve-phonemes
- Language: Python
- Homepage:
- Size: 15.3 MB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Phoneme Guesser
Utility to retrieve phonemes from text
This was developed for wake word detection automation using pocketsphinx,
phonemes are retrieved from slightly processed .dict files in models
from [sourceforge](https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/)for `en` and `es`, out of vocab words will use an heuristic approach, for
other languages out of vocab words will return closest match from known
wordshelp wanted to implement heuristics for other languages
see supported languages in [this folder](./phoneme_guesser/res)
# Install
```bash
pip install phoneme_guesser
```# Usage
```python
from phoneme_guesser import get_phonemes# if words are know, it is a simple dictionary lookup
en = "ok google"
print(get_phonemes(en, "en"))
# OW K EY . G UW G AH Len = "hey andromeda"
print(get_phonemes(en, "en"))
# HH EY . AE N D R AA M AH D AHpt = "ó ambrósio"
print(get_phonemes(pt, "pt-br"))
# O . a~ b r O z i u# for en and es, out of vocab words will use an heuristic approach
# help wanted to implement heuristics for other languageswakeword = "hey mycroft"
print(get_phonemes(wakeword, "en-us"))
# HH EH Y . M Y K R OW F Tprint(get_phonemes(wakeword, "es-es"))
# e i . m y k r o f t# when heuristics are not implemented
# out of vocab words will return closest match from known wordsfr = "Bonjour firefox" # notice firefox failure
print(get_phonemes(fr, "fr"))
# bb on jj ou rr . ff yy ai ffit = "ciao google"
print(get_phonemes(it, "it"))
# k j a1 m o . d OO LL LL e```