https://github.com/jailuthra/asr
Kaldi ASR wrapper scripts
https://github.com/jailuthra/asr
asr kaldi praat speech speech-recognition
Last synced: 11 months ago
JSON representation
Kaldi ASR wrapper scripts
- Host: GitHub
- URL: https://github.com/jailuthra/asr
- Owner: jailuthra
- License: mit
- Created: 2017-06-03T07:48:44.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-17T07:18:24.000Z (almost 9 years ago)
- Last Synced: 2025-01-20T19:53:47.332Z (over 1 year ago)
- Topics: asr, kaldi, praat, speech, speech-recognition
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ASR Scripts
This project aims to simplify using Kaldi for speech recognition and alignment.
It currently works with the [ASpIRE pre-trained model](http://kaldi-asr.org/models.html), although the scripts can be extended easily to work with different/custom trained models.
## Installation
### Prerequisites
* Compiled Kaldi instance ([instructions](https://github.com/kaldi-asr/kaldi/blob/master/INSTALL))
* ASpIRE chain pre-trained model ([download](http://kaldi-asr.org/models.html), [preparation](https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/))
* For displaying the TextGrid alignment files, you will need to install [praat](http://www.fon.hum.uva.nl/praat/).
* For generating TextGrid alignment files, you will need to install the python package for [praatIO](https://github.com/timmahrt/praatIO).
### Download scripts
* `$ git clone https://github.com/jailuthra/asr`
* Place the scripts in `kaldi/egs/aspire/s5` directory.
#### Input audio constraints
Mono PCM wave files, 16-bit sample size, 8KHz sampling rate.
## Scripts
* **`aspire.py`**: Decodes and aligns the wav files using the pre-trained model, calls the other scripts
* `filegen.py`: Generates reqd. speaker-id, utterance-id information files using the wav files
* `id2phone.py, id2word.py`: Convert phone/word ids in ctm output, to actual phones/words
* `ctm2tg.py`: Convert ctm output to Praat TextGrid files
## Usage
1. Create a directory with all your wav files.
2. File naming convention is `_.wav` for example `0001_0001.wav`, `0001_0002.wav`.
3. Call the aspire script: `./aspire.py `.
4. It will generate text transcriptions and alignment files in the output directory.