{"id":13564137,"url":"https://github.com/thu-spmi/CAT","last_synced_at":"2025-04-03T21:30:36.668Z","repository":{"id":40944247,"uuid":"223221415","full_name":"thu-spmi/CAT","owner":"thu-spmi","description":"A CRF-based ASR Toolkit","archived":false,"fork":false,"pushed_at":"2024-08-13T13:48:10.000Z","size":51121,"stargazers_count":325,"open_issues_count":12,"forks_count":74,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-11-04T17:47:10.270Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thu-spmi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-21T16:51:41.000Z","updated_at":"2024-10-04T04:58:38.000Z","dependencies_parsed_at":"2023-01-29T11:15:57.368Z","dependency_job_id":"1d9ec78f-c4ba-47b1-a959-afd1f2561906","html_url":"https://github.com/thu-spmi/CAT","commit_stats":{"total_commits":243,"total_committers":22,"mean_commits":"11.045454545454545","dds":0.6131687242798354,"last_synced_commit":"618a15f70780200cdc42eed3f69f6ce1d61a4e61"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-spmi%2FCAT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-spmi%2FCAT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-spmi%2FCAT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-spmi%2FCAT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thu-spmi","download_url":"https://codeload.github.com/thu-spmi/CAT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247082825,"owners_count":20880725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:01:27.052Z","updated_at":"2025-04-03T21:30:31.659Z","avatar_url":"https://github.com/thu-spmi.png","language":"Python","funding_links":[],"categories":["Python","语音识别"],"sub_categories":["网络服务_其他"],"readme":"\u003cdiv align=\"center\"\u003e\u003cimg src=\"./assets/logo.png\" width=200\u003e\u003c/div\u003e\n\n# CAT: CRF-based ASR Toolkit\n**CAT provides a complete workflow for CRF-based data-efficient end-to-end speech recognition.**\n\n- [Overview](#overview)\n- [Features](#features)\n- [Installation](#installation)\n- [Getting started](#getting-started)\n- [ASR results](#asr-results)\n- [Further reading](#further-reading)\n\n## Overview\n\nCAT aims at combining the advantages of both the hybrid and the E2E ASR approches to achieve data-efficiency, by judiciously examining the pros and cons of modularity versus unified neural network, separate optimization versus joint optimization. CAT advocates global normalization modeling and discriminative training in the framework of [Conditional Random Field](https://en.wikipedia.org/wiki/Conditional_random_field) (CRF), currently with [Connectionist Temporal Classification](https://mediatum.ub.tum.de/doc/1292048/file.pdf) (CTC) inspired state topology.\n\n\n## Features\n\n1. CAT contains a full-fledged CUDA/C/C++ implementation of CTC-CRF loss function binding to PyTorch.\n\n2. One-stop CTC/CTC-CRF/RNN-T/LM training \u0026 inference. See the [templates](egs/TEMPLATE).\n\n3. Flexible configuration with JSON. Check the [guideline for configuration](docs/configure_guide.md).\n\n4. Scalable and extensible. It is easy to be extended to train tens of thousands of speech data and add new models and tasks.\n\nSee [What's New](docs/whatsnew.md) for recently added functionalities and features!\n\n## Installation\n\n1. Dependencies\n\n   - CUDA compatible device, NVIDIA driver installed and CUDA lib available.\n   - PyTorch: `\u003e=1.9.0` is required. [Installation guide from PyTorch](https://pytorch.org/get-started/locally/#start-locally)\n   - [Kaldi](https://github.com/kaldi-asr/kaldi) **\\[optional, but recommended\\]**: used for speech data preparation and some FST-related operations. This is optional for most of the basic functions. Required only when you want to conduct [CTC-CRF](egs/TEMPLATE/exp/asr-ctc-crf) training.\n      \n      Besides Kaldi, you could use `torchaudio` for feature extraction. Take a look at [data.sh](egs/aishell/local/data.sh) for how to prepare data with `torchaudio`.\n\n2. Clone and install CAT\n\n   ```bash\n   git clone https://github.com/thu-spmi/CAT.git \u0026\u0026 cd CAT\n   # Get installation helping message\n   ./install.sh -h\n   # Install with default configurations\n   #./install.sh\n   ```\n\n## Getting started\n\nTo get started with this project, please refer to [TEMPLATE](egs/TEMPLATE/README.md) for tutorial.\n\n\n## ASR results\n\n| dataset                                                                                                                | evaluation sets         | performance  |\n| ---------------------------------------------------------------------------------------------------------------------- | ----------------------- | ------------ |\n| [AISHELL-1](egs/aishell#result)                                                                                        | dev / test              | 3.93 / 4.22  |\n| [Commonvoice German](https://github.com/thu-spmi/CAT/blob/v2/egs/commonvoice/RESULT.md#conformertransformer-rescoring) | test                    | 9.8          |\n| [Librispeech](egs/libri#result)                                                                                        | test-clean / test-other | 1.94 / 4.39  |\n| [Switchboard](https://github.com/thu-spmi/CAT/blob/v2/egs/swbd/RESULT.md#conformertransformer-rescoring)               | switchboard / callhome  | 6.9 / 14.5   |\n| [THCHS30](https://github.com/thu-spmi/CAT/blob/v2/egs/thchs30/RESULT.md#vgg-blstm)                                     | test                    | 6.01         |\n| [Wenetspeech](egs/wenetspeech#result)                                                                                  | test-net / test-meeting | 9.32 / 14.66 |\n| [WSJ](egs/wsj/RESULT.md)                                                                                               | eval92 / dev93          | 2.77 / 5.68  |\n\n## Further reading\n\n- [Some tips about the usage of third party tools](docs/guide_for_third_party_tools.md)\n- [Tutorial on building your first CAT project (yesno)](docs/yesno_tutorial_ch.md)\n- [Step-by-step workflow for CAT-v2](docs/toolkitworkflow.md)\n\n\n## Citation\n\n```\n@inproceedings{xiang2019crf,\n  title={CRF-based single-stage acoustic modeling with CTC topology},\n  author={Xiang, Hongyu and Ou, Zhijian},\n  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n  pages={5676--5680},\n  year={2019},\n  organization={IEEE}\n}\n\n@inproceedings{an2020cat,\n  title={CAT: A CTC-CRF based ASR toolkit bridging the hybrid and the end-to-end approaches towards data efficiency and low latency},\n  author={An, Keyu and Xiang, Hongyu and Ou, Zhijian},\n  booktitle={INTERSPEECH},\n  pages={566--570},\n  year={2020}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-spmi%2FCAT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthu-spmi%2FCAT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-spmi%2FCAT/lists"}