{"id":21059749,"url":"https://github.com/jailuthra/asr","last_synced_at":"2025-07-28T00:07:59.160Z","repository":{"id":67186486,"uuid":"93232637","full_name":"jailuthra/asr","owner":"jailuthra","description":"Kaldi ASR wrapper scripts","archived":false,"fork":false,"pushed_at":"2017-07-17T07:18:24.000Z","size":11,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-20T19:53:47.332Z","etag":null,"topics":["asr","kaldi","praat","speech","speech-recognition"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jailuthra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-03T07:48:44.000Z","updated_at":"2022-06-13T03:54:53.000Z","dependencies_parsed_at":"2023-03-11T01:09:53.658Z","dependency_job_id":null,"html_url":"https://github.com/jailuthra/asr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jailuthra%2Fasr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jailuthra%2Fasr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jailuthra%2Fasr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jailuthra%2Fasr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jailuthra","download_url":"https://codeload.github.com/jailuthra/asr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243505952,"owners_count":20301617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","kaldi","praat","speech","speech-recognition"],"created_at":"2024-11-19T17:13:02.020Z","updated_at":"2025-03-14T00:42:18.953Z","avatar_url":"https://github.com/jailuthra.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ASR Scripts\n\nThis project aims to simplify using Kaldi for speech recognition and alignment.\nIt currently works with the [ASpIRE pre-trained model](http://kaldi-asr.org/models.html), although the scripts can be extended easily to work with different/custom trained models.\n\n## Installation\n\n### Prerequisites\n\n* Compiled Kaldi instance ([instructions](https://github.com/kaldi-asr/kaldi/blob/master/INSTALL))\n* ASpIRE chain pre-trained model ([download](http://kaldi-asr.org/models.html), [preparation](https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/))\n* For displaying the TextGrid alignment files, you will need to install [praat](http://www.fon.hum.uva.nl/praat/).\n* For generating TextGrid alignment files, you will need to install the python package for [praatIO](https://github.com/timmahrt/praatIO).\n\n### Download scripts\n\n* `$ git clone https://github.com/jailuthra/asr`\n* Place the scripts in `kaldi/egs/aspire/s5` directory.\n\n#### Input audio constraints\nMono PCM wave files, 16-bit sample size, 8KHz sampling rate.\n\n## Scripts\n\n* **`aspire.py`**: Decodes and aligns the wav files using the pre-trained model, calls the other scripts\n* `filegen.py`: Generates reqd. speaker-id, utterance-id information files using the wav files\n* `id2phone.py, id2word.py`: Convert phone/word ids in ctm output, to actual phones/words\n* `ctm2tg.py`: Convert ctm output to Praat TextGrid files\n\n## Usage\n\n1. Create a directory with all your wav files.\n2. File naming convention is `\u003cspeaker_id\u003e_\u003cutterance_id\u003e.wav` for example `0001_0001.wav`, `0001_0002.wav`.\n3. Call the aspire script: `./aspire.py \u003cwavdir\u003e \u003coutputdir\u003e`.\n4. It will generate text transcriptions and alignment files in the output directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjailuthra%2Fasr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjailuthra%2Fasr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjailuthra%2Fasr/lists"}