https://github.com/tigregotico/phoneme_guesser

Utility to retrieve phonemes from text
https://github.com/tigregotico/phoneme_guesser

languages phoneme-guesser phoneme-prediction phonemes retrieve-phonemes

Last synced: 2 months ago
JSON representation

Utility to retrieve phonemes from text

Host: GitHub
URL: https://github.com/tigregotico/phoneme_guesser
Owner: TigreGotico
License: apache-2.0
Created: 2020-12-20T14:02:36.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2022-02-15T20:29:17.000Z (over 3 years ago)
Last Synced: 2025-06-26T02:42:41.875Z (4 months ago)
Topics: languages, phoneme-guesser, phoneme-prediction, phonemes, retrieve-phonemes
Language: Python
Homepage:
Size: 15.3 MB
Stars: 6
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: LICENSE.md

Awesome Lists containing this project

README

          # Phoneme Guesser

Utility to retrieve phonemes from text

This was developed for wake word detection automation using pocketsphinx, 

phonemes are retrieved from slightly processed .dict files in models 

from [sourceforge](https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/)

for `en` and `es`, out of vocab words will use an heuristic approach, for 

other languages out of vocab words will return closest match from known 

words

help wanted to implement heuristics for other languages

see supported languages in [this folder](./phoneme_guesser/res) 

# Install

```bash

pip install phoneme_guesser

```

# Usage

```python

from phoneme_guesser import get_phonemes

# if words are know, it is a simple dictionary lookup

en = "ok google"

print(get_phonemes(en, "en"))

# OW K EY . G UW G AH L

en = "hey andromeda"

print(get_phonemes(en, "en"))

# HH EY . AE N D R AA M AH D AH

pt = "ó ambrósio"

print(get_phonemes(pt, "pt-br"))

# O . a~ b r O z i u

# for en and es, out of vocab words will use an heuristic approach

# help wanted to implement heuristics for other languages

wakeword = "hey mycroft"

print(get_phonemes(wakeword, "en-us"))

# HH EH Y . M Y K R OW F T

print(get_phonemes(wakeword, "es-es"))

# e i . m y k r o f t

# when heuristics are not implemented

# out of vocab words will return closest match from known words

fr = "Bonjour firefox"  # notice firefox failure

print(get_phonemes(fr, "fr"))

# bb on jj ou rr . ff yy ai ff

it = "ciao google"

print(get_phonemes(it, "it"))

# k j a1 m o . d OO LL LL e

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tigregotico/phoneme_guesser

Awesome Lists containing this project

README