https://github.com/jailuthra/asr

Kaldi ASR wrapper scripts
https://github.com/jailuthra/asr

asr kaldi praat speech speech-recognition

Last synced: 11 months ago
JSON representation

Kaldi ASR wrapper scripts

Host: GitHub
URL: https://github.com/jailuthra/asr
Owner: jailuthra
License: mit
Created: 2017-06-03T07:48:44.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-07-17T07:18:24.000Z (almost 9 years ago)
Last Synced: 2025-01-20T19:53:47.332Z (over 1 year ago)
Topics: asr, kaldi, praat, speech, speech-recognition
Language: Python
Homepage:
Size: 10.7 KB
Stars: 2
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ASR Scripts

This project aims to simplify using Kaldi for speech recognition and alignment.
It currently works with the [ASpIRE pre-trained model](http://kaldi-asr.org/models.html), although the scripts can be extended easily to work with different/custom trained models.

## Installation

### Prerequisites

* Compiled Kaldi instance ([instructions](https://github.com/kaldi-asr/kaldi/blob/master/INSTALL))
* ASpIRE chain pre-trained model ([download](http://kaldi-asr.org/models.html), [preparation](https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/))
* For displaying the TextGrid alignment files, you will need to install [praat](http://www.fon.hum.uva.nl/praat/).
* For generating TextGrid alignment files, you will need to install the python package for [praatIO](https://github.com/timmahrt/praatIO).

### Download scripts

* `$ git clone https://github.com/jailuthra/asr`
* Place the scripts in `kaldi/egs/aspire/s5` directory.

#### Input audio constraints
Mono PCM wave files, 16-bit sample size, 8KHz sampling rate.

## Scripts

* **`aspire.py`**: Decodes and aligns the wav files using the pre-trained model, calls the other scripts
* `filegen.py`: Generates reqd. speaker-id, utterance-id information files using the wav files
* `id2phone.py, id2word.py`: Convert phone/word ids in ctm output, to actual phones/words
* `ctm2tg.py`: Convert ctm output to Praat TextGrid files

## Usage

1. Create a directory with all your wav files.
2. File naming convention is `_.wav` for example `0001_0001.wav`, `0001_0002.wav`.
3. Call the aspire script: `./aspire.py `.
4. It will generate text transcriptions and alignment files in the output directory.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jailuthra/asr

Awesome Lists containing this project

README