https://github.com/contraslash/sphinx-ops-isolated-words-models
https://github.com/contraslash/sphinx-ops-isolated-words-models
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/contraslash/sphinx-ops-isolated-words-models
- Owner: contraslash
- Created: 2019-12-27T02:22:15.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T03:21:28.000Z (over 3 years ago)
- Last Synced: 2025-06-12T05:54:49.082Z (about 1 year ago)
- Language: HTML
- Size: 115 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Sphinx OPS Isolated Word Models
This is the main repository to build an acoustic model for sphinx based on the
[Open Speech Corpus Isolated Word Corpus](http://openspeechcorpus.contraslash.com/isolated-words/list/).
First execute the script `01_download_word_recordings.py`, this script will fetch all data from OPS.
Then execute the script `02_convert_mp4_to_wav.py`, to execute this script you must have [FFMpeg](https://www.ffmpeg.org/)
installed and on your path.
After you need to prepare sphinx configuration data, to achieve this:
Then you can call the script `03_configure_sphinx.py`, this script will configure almost all the files required by sphinx,
but to create a custom language model you need to execute `04_generate_language_model.sh`.
Make sure you have [sphinxtrain](https://github.com/cmusphinx/sphinxtrain) installed on your pc
Now execute
```bash
sphinxtrain -t ops_isolated_words setup
```
After this in your etc folder you will have a full structure or what you need for your project
Please check [this link](https://cmusphinx.github.io/wiki/tutorialam/#setting-up-the-training-scripts) for further
information.
Search for `$CFG_HMM_TYPE` and select `.semi`
If you are on a multicore machine change `$CFG_QUEUE_TYPE` to `Queue::POSIX` and `$CFG_NPART` and `$DEC_CFG_NPART` to your machine cores
Then execute the train
```bash
sphinxtrain run
```
This could take some time.
To check the results
```bash
pocketsphinx_continuous -hmm model_parameters/ops_isolated_words.ci_semi/ -lm etc/ops_isolated_words.lm.DMP -dict etc/ops_isolated_words.dic -inmic yes
```