{"id":17464302,"url":"https://github.com/daanzu/kaldi_ag_training","last_synced_at":"2025-04-19T18:59:10.747Z","repository":{"id":47760331,"uuid":"391650171","full_name":"daanzu/kaldi_ag_training","owner":"daanzu","description":"Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.","archived":false,"fork":false,"pushed_at":"2022-01-24T12:53:31.000Z","size":152,"stargazers_count":20,"open_issues_count":2,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-29T12:02:25.778Z","etag":null,"topics":["custom","fine-tuning","kaldi","kaldi-asr","personal","speech","speech-recognition","speech-to-text","training"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daanzu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-08-01T14:30:06.000Z","updated_at":"2024-09-07T20:20:44.000Z","dependencies_parsed_at":"2022-08-23T22:31:26.350Z","dependency_job_id":null,"html_url":"https://github.com/daanzu/kaldi_ag_training","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fkaldi_ag_training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fkaldi_ag_training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fkaldi_ag_training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanzu%2Fkaldi_ag_training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daanzu","download_url":"https://codeload.github.com/daanzu/kaldi_ag_training/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249239295,"owners_count":21235835,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["custom","fine-tuning","kaldi","kaldi-asr","personal","speech","speech-recognition","speech-to-text","training"],"created_at":"2024-10-18T10:45:17.850Z","updated_at":"2025-04-16T12:30:55.436Z","avatar_url":"https://github.com/daanzu.png","language":"Shell","funding_links":["https://github.com/sponsors/daanzu","https://www.patreon.com/daanzu","https://paypal.me/daanzu"],"categories":[],"sub_categories":[],"readme":"# Kaldi AG Training Setup\n\n[![Donate](https://img.shields.io/badge/donate-GitHub-pink.svg)](https://github.com/sponsors/daanzu)\n[![Donate](https://img.shields.io/badge/donate-Patreon-orange.svg)](https://www.patreon.com/daanzu)\n[![Donate](https://img.shields.io/badge/donate-PayPal-green.svg)](https://paypal.me/daanzu)\n\nDocker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with [kaldi-active-grammar](https://github.com/daanzu/kaldi-active-grammar).\n\n## Usage\n\nAll commands are run in the Docker container as follows. Training on the CPU should work, just much more slowly. To do so, remove the `--runtime=nvidia` and use the image `daanzu/kaldi_ag_training:2020-11-28` instead the GPU image. You can run Docker directly with the following parameter structure, or as a shortcut, use the `run_docker.sh` script (and edit it to suit your needs and configuration).\n\n```bash\ndocker run -it --rm -v $(pwd):/mnt/input -w /mnt/input --user \"$(id -u):$(id -g)\" \\\n    --runtime=nvidia daanzu/kaldi_ag_training_gpu:2020-11-28 \\\n    [command and args...]\n```\n\nExample commands:\n\n```bash\n# Download and prepare base model (needed for either finetuning or personal model training)\nwget https://github.com/daanzu/kaldi_ag_training/releases/download/v0.1.0/kaldi_model_daanzu_20200905_1ep-mediumlm-base.zip\nunzip kaldi_model_daanzu_20200905_1ep-mediumlm-base.zip\n\n# Prepare training dataset files\npython3 convert_tsv_to_scp.py yourdata.tsv [optional output directory]\n\n# Pick only one of the following:\n# Run finetune training, with default settings\nbash run_docker.sh bash run.finetune.sh kaldi_model_daanzu_20200905_1ep-mediumlm-base dataset\n# Run completely personal training, with default settings\nbash run_docker.sh bash run.personal.sh kaldi_model_daanzu_20200905_1ep-mediumlm-base dataset\n\n# When training completes, export trained model\npython3 export_trained_model.py {finetune,personal} [optional output directory]\n# Finally run the following in your kaldi-active-grammar python environment (will take as much as an hour and several GB of RAM)\npython3 -m kaldi_active_grammar compile_agf_dictation_graph -v -m [model_dir]\n\n# Test a new or old model\npython3 test_model.py testdata.tsv [model_dir]\n```\n\n### Notes\n\n* To run either training, you must have a base model to use as a template. (For finetuning this is also the starting point of the model; for personal it is only a source of basic info.) You can use [this base model](https://github.com/daanzu/kaldi_ag_training/releases/download/v0.1.0/kaldi_model_daanzu_20200905_1ep-mediumlm-base.zip) from this project's release page. Download the zip file and extract it to the root directory of this repo, so the directory `kaldi_model_daanzu_20200905_1ep-mediumlm-base` is here.\n\n* Kaldi requires the training data metadata to be in the SCP format, which is an annoying multi-file format. To convert the standard KaldiAG TSV format to SCP, you can run `python3 convert_tsv_to_scp.py yourdata.tsv dataset` to output SCP format in a new directory `dataset`. You can run these commands within the Docker container, or directly using your own python environment.\n    * Even better, run `python3 convert_tsv_to_scp.py -l kaldi_model_daanzu_20200905_1ep-mediumlm-base/dict/lexicon.txt yourdata.tsv dataset` to filter out utterances containing out-of-vocabulary words. OOV words are not currently well supported by these training scripts.\n\n* The audio data should be 16-bit Signed Integer PCM 1-channel 16kHz WAV files. Note that it needs to be accessible within the Docker container, so it can't be behind a symlink that points outside this repo directory, which is shared with the Docker container.\n\n* There are some directory names you should avoid using in this repo directory, because the scripts will create \u0026 use them during training. Avoid: `conf`, `data`, `exp`, `extractor`, `mfcc`, `steps`, `tree_sp`, `utils`.\n\n* Training may use a lot of storage. You may want to locate this directory somewhere with ample room available.\n\n* The training commands (`run.*.sh`) accept many optional parameters. More info later.\n\n    * `--stage n` : Skip to given stage.\n    * `--num-utts-subset 3000` : You may need this parameter to prevent an error at the beginning of nnet training if your training data contains many short (command-like) utterances. (3000 is a perhaps overly careful suggestion; 300 is the default value.)\n\n* I decided to try to treat the docker image as evergreen, and keep the things liable to change a lot like scripts in the git repo instead.\n\n* The format of the training dataset input `.tsv` file is of tab-separated-values fields as follows: `wav_filename ignored ignored ignored text_transcript`\n\n## Related Repositories\n\n* [daanzu/speech-training-recorder](https://github.com/daanzu/speech-training-recorder): Simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition or speech synthesis.\n* [daanzu/kaldi-active-grammar](https://github.com/daanzu/kaldi-active-grammar): Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time.\n\n## License\n\nThis project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the [LICENSE file](LICENSE) for details. If this license is problematic for you, please contact me.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaanzu%2Fkaldi_ag_training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaanzu%2Fkaldi_ag_training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaanzu%2Fkaldi_ag_training/lists"}