Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thu-spmi/CAT
A CRF-based ASR Toolkit
https://github.com/thu-spmi/CAT
Last synced: 3 months ago
JSON representation
A CRF-based ASR Toolkit
- Host: GitHub
- URL: https://github.com/thu-spmi/CAT
- Owner: thu-spmi
- License: apache-2.0
- Created: 2019-11-21T16:51:41.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-07-23T03:11:45.000Z (6 months ago)
- Last Synced: 2024-07-23T05:48:39.929Z (6 months ago)
- Language: Python
- Size: 47.3 MB
- Stars: 317
- Watchers: 21
- Forks: 74
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - thu-spmi/CAT
README
# CAT: CRF-based ASR Toolkit
**CAT provides a complete workflow for CRF-based data-efficient end-to-end speech recognition.**- [Overview](#overview)
- [Features](#features)
- [Installation](#installation)
- [Getting started](#getting-started)
- [ASR results](#asr-results)
- [Further reading](#further-reading)## Overview
CAT aims at combining the advantages of both the hybrid and the E2E ASR approches to achieve data-efficiency, by judiciously examining the pros and cons of modularity versus unified neural network, separate optimization versus joint optimization. CAT advocates global normalization modeling and discriminative training in the framework of [Conditional Random Field](https://en.wikipedia.org/wiki/Conditional_random_field) (CRF), currently with [Connectionist Temporal Classification](https://mediatum.ub.tum.de/doc/1292048/file.pdf) (CTC) inspired state topology.
## Features
1. CAT contains a full-fledged CUDA/C/C++ implementation of CTC-CRF loss function binding to PyTorch.
2. One-stop CTC/CTC-CRF/RNN-T/LM training & inference. See the [templates](egs/TEMPLATE).
3. Flexible configuration with JSON. Check the [guideline for configuration](docs/configure_guide.md).
4. Scalable and extensible. It is easy to be extended to train tens of thousands of speech data and add new models and tasks.
See [What's New](docs/whatsnew.md) for recently added functionalities and features!
## Installation
1. Dependencies
- CUDA compatible device, NVIDIA driver installed and CUDA lib available.
- PyTorch: `>=1.9.0` is required. [Installation guide from PyTorch](https://pytorch.org/get-started/locally/#start-locally)
- [Kaldi](https://github.com/kaldi-asr/kaldi) **\[optional, but recommended\]**: used for speech data preparation and some FST-related operations. This is optional for most of the basic functions. Required only when you want to conduct [CTC-CRF](egs/TEMPLATE/exp/asr-ctc-crf) training.
Besides Kaldi, you could use `torchaudio` for feature extraction. Take a look at [data.sh](egs/aishell/local/data.sh) for how to prepare data with `torchaudio`.2. Clone and install CAT
```bash
git clone https://github.com/thu-spmi/CAT.git && cd CAT
# Get installation helping message
./install.sh -h
# Install with default configurations
#./install.sh
```## Getting started
To get started with this project, please refer to [TEMPLATE](egs/TEMPLATE/README.md) for tutorial.
## ASR results
| dataset | evaluation sets | performance |
| ---------------------------------------------------------------------------------------------------------------------- | ----------------------- | ------------ |
| [AISHELL-1](egs/aishell#result) | dev / test | 3.93 / 4.22 |
| [Commonvoice German](https://github.com/thu-spmi/CAT/blob/v2/egs/commonvoice/RESULT.md#conformertransformer-rescoring) | test | 9.8 |
| [Librispeech](egs/libri#result) | test-clean / test-other | 1.94 / 4.39 |
| [Switchboard](https://github.com/thu-spmi/CAT/blob/v2/egs/swbd/RESULT.md#conformertransformer-rescoring) | switchboard / callhome | 6.9 / 14.5 |
| [THCHS30](https://github.com/thu-spmi/CAT/blob/v2/egs/thchs30/RESULT.md#vgg-blstm) | test | 6.01 |
| [Wenetspeech](egs/wenetspeech#result) | test-net / test-meeting | 9.32 / 14.66 |
| [WSJ](egs/wsj/RESULT.md) | eval92 / dev93 | 2.77 / 5.68 |## Further reading
- [Some tips about the usage of third party tools](docs/guide_for_third_party_tools.md)
- [Tutorial on building your first CAT project (yesno)](docs/yesno_tutorial_ch.md)
- [Step-by-step workflow for CAT-v2](docs/toolkitworkflow.md)## Citation
```
@inproceedings{xiang2019crf,
title={CRF-based single-stage acoustic modeling with CTC topology},
author={Xiang, Hongyu and Ou, Zhijian},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={5676--5680},
year={2019},
organization={IEEE}
}@inproceedings{an2020cat,
title={CAT: A CTC-CRF based ASR toolkit bridging the hybrid and the end-to-end approaches towards data efficiency and low latency},
author={An, Keyu and Xiang, Hongyu and Ou, Zhijian},
booktitle={INTERSPEECH},
pages={566--570},
year={2020}
}
```