{"id":13935692,"url":"https://github.com/jinserk/pytorch-asr","last_synced_at":"2025-07-19T20:33:44.259Z","repository":{"id":149089968,"uuid":"115148629","full_name":"jinserk/pytorch-asr","owner":"jinserk","description":"ASR with PyTorch","archived":false,"fork":false,"pushed_at":"2019-03-10T19:50:45.000Z","size":623,"stargazers_count":140,"open_issues_count":4,"forks_count":19,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-08-08T23:21:35.174Z","etag":null,"topics":["asr","capsule-network","ctc","decoder","deepspeech","densenet","dictation","kaldi","kaldi-decoder","lattice","lvcsr","pyro","python","pytorch","pytorch-binding","resnet","speech","speech-recognition","ss-vae","transcription"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jinserk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-22T20:38:57.000Z","updated_at":"2024-07-24T07:24:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"4b8b9cb4-faf1-4928-838a-13aa58daf769","html_url":"https://github.com/jinserk/pytorch-asr","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinserk%2Fpytorch-asr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinserk%2Fpytorch-asr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinserk%2Fpytorch-asr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinserk%2Fpytorch-asr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jinserk","download_url":"https://codeload.github.com/jinserk/pytorch-asr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226677237,"owners_count":17666020,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","capsule-network","ctc","decoder","deepspeech","densenet","dictation","kaldi","kaldi-decoder","lattice","lvcsr","pyro","python","pytorch","pytorch-binding","resnet","speech","speech-recognition","ss-vae","transcription"],"created_at":"2024-08-07T23:02:00.108Z","updated_at":"2024-11-27T03:31:08.252Z","avatar_url":"https://github.com/jinserk.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# ASR with PyTorch\n\nThis repository maintains an experimental code for speech recognition using [PyTorch](https://github.com/pytorch/pytorch) and [Kaldi](https://github.com/kaldi-asr/kaldi).\nWe are more focusing on better acoustic model that produce phoneme sequence than end-to-end transcription.\nFor this purpose, the Kaldi latgen decoder is integrated as a PyTorch CppExtension.\n\nThe code was tested with Python 3.7 and PyTorch 1.0.0rc1. We have a lot of [f-strings](https://www.python.org/dev/peps/pep-0498/), so you must use Python 3.6 or later.\n\n## Performance\n\n| model | train dataset | dev dataset | test dataset | LER | WER |\n|-------|---------------|-------------|--------------|-----|-----|\n| decoder baseline\u003csup id=\"a1\"\u003e[1](#f1)\u003c/sup\u003e | - | - | swbd rt03 | - | 1.74% |\n| deepspeech_var | aspire + swbd train | swbd eval2000 | swbd rt03 | 33.73% | 37.75% |\n| las | aspire + swbd train | swbd eval2000 | swbd rt03 |       |      |\n\n\n\u003csub\u003e\u003csup id=\"f1\"\u003e1. This is the result by engaging the phone label sequences (onehot vectors) into the decoder input.\nThe result is from \u003c 20-sec utterances, choosing a random pronunciation for words from the lexicon if the words have multiple pronunciations, after\ninserting sil phones with prob 0.2 between the words and with prob 0.8 at the beginning and end of the utterances.\nplease see [here](https://github.com/jinserk/pytorch-asr/blob/master/asr/models/trainer.py#L459) with `target_test=True`. [\u0026#9166;](#a1)\u003c/sup\u003e\u003c/sub\u003e\n\n## Installation\n\n**Prerequisites:**\n* Python 3.6+\n* [PyTorch 1.0.0+](https://github.com/pytorch/pytorch/pytorch.git)\n* [Kaldi 5.3+](https://github.com/kaldi-asr/kaldi.git)\n* [TNT](https://github.com/pytorch/tnt.git)\n\nWe recommend [pyenv](https://github.com/pyenv/pyenv).\nDo not forget to set `pyenv local \u003cpython-version\u003e` in the local repo if you're using pyenv.\n\nTo avoid the `-fPIC` related compile error, you have to configure Kaldi with `--shared` option when you install it.\n\nInstall dependent packages:\n```\n$ sudo apt install sox libsox-dev\n```\n\nDownload:\n```\n$ git clone https://github.com/jinserk/pytorch-asr.git\n```\n\nInstall required Python modules:\n```\n$ cd pytorch-asr\n$ pip install -r requirements.txt\n```\n\nIf you have an installation error of `torchaudio` on a CentOS machine, add the followings to your `~/.bashrc`.\n```\nexport CPLUS_INCLUDE_PATH=/usr/include/sox:$CPLUS_INCLUDE_PATH\n```\ndon't forget to do `$ source ~/.bashrc` before you try to install the requirements.\n\nModify the Kaldi path in `_path.py`:\n```\n$ cd asr/kaldi\n$ vi _path.py\n\nKALDI_ROOT = \u003ckaldi-installation-path\u003e\n```\n\nBuild up PyTorch-binding of Kaldi decoder:\n```\n$ python setup.py install\n```\nThis takes a while to download the Kaldi's official ASpIRE chain model and its post-processing.\nIf you want to use your own language model or graphs, modify `asr/kaldi/scripts/mkgraph.sh` according to your settings.\n**The binding install method has been changed to use PyTorch's CppExtension, instead of ffi.\nThis will install a package named `torch_asr._latgen_lib`.**\n\n\n## Training\n\nPytorch-asr is targeted to develop a framework supporting multiple acoustic models. You have to specify one of the models to train or predict.\nCurrently, the `deepspeech_ctc` model is only maintained from the frequent updated training and prediction modules. Try this model first.\nWe'll follow up the other models for the updated interface soon. Sorry for your inconvenience.\n\nIf you do training for the first time, you need to preprocess the dataset.\nCurrently we utilize the contents of `data` directory in Kaldi's recipe directories that are containing preprocessed corpus data.\nYou need to run the preparation script in each Kaldi recipe before doing the followings.\nNow we support the Kaldi's `aspire`, `swbd`, and `tedlium` recipes. You will need LDC's corpora to use `aspire` and `swbd` datasets.\nPlease modify `RECIPE_PATH` variable in `asr/datasets/*.py` first according to the location of your Kaldi setup.\n```\n$ python prepare.py aspire \u003cdata-path\u003e\n```\n\nStart a new training with:\n```\n$ python train.py \u003cmodel-name\u003e --use-cuda\n```\ncheck `--help` option to see which parameters are available for the model.\n\nIf you want to resume training from a saved model file:\n```\n$ python train.py \u003cmodel-name\u003e --use-cuda --continue-from \u003cmodel-file\u003e\n```\n\nYou can use `--visdom` option to see the loss propagation.\nPlease make sure that you already have a running visdom process before you start a training with `--visdom` option.\n`--tensorboard` option is outdated since TensorboardX package doesn't support the latest PyTorch.\n\nYou can also use `--slack` option to redirect logs to slack DM.\nIf you want to use this, first setup a slack workplace and add \"Bots\" app to the workplace.\nYou must obtain the Bots' token and your id from the slack setting.\nThen set environment variables `SLACK_API_TOKEN` and `SLACK_API_USER` for each of them.\n\n\n## Prediction\n\nYou can predict a sample with trained model file:\n```\n$ python predict.py \u003cmodel-name\u003e --continue-from \u003cmodel-file\u003e \u003ctarget-wav-file1\u003e \u003ctarget-wav-file2\u003e ...\n```\n\n## Acknowledgement\n\nSome models are imported from the following projects. We appreciate all their work and all right of the codes belongs to them.\n\n* DeepSpeech : https://github.com/SeanNaren/deepspeech.pytorch.git\n* ResNet : https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py\n* DenseNet : https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinserk%2Fpytorch-asr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjinserk%2Fpytorch-asr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinserk%2Fpytorch-asr/lists"}