Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vectominist/miniasr
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
https://github.com/vectominist/miniasr
asr ctc fairseq hubert minimal pytorch s3prl speech-recognition speech-representation wav2vec2
Last synced: 10 days ago
JSON representation
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
- Host: GitHub
- URL: https://github.com/vectominist/miniasr
- Owner: vectominist
- License: mit
- Created: 2021-07-14T09:33:16.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-06T20:57:31.000Z (almost 2 years ago)
- Last Synced: 2023-05-25T03:00:45.244Z (over 1 year ago)
- Topics: asr, ctc, fairseq, hubert, minimal, pytorch, s3prl, speech-recognition, speech-representation, wav2vec2
- Language: Jupyter Notebook
- Homepage:
- Size: 342 KB
- Stars: 36
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MiniASR
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
## Intro
### Why Mini?
* **Minimal Training** โฑ
Self-supervised pre-trained models + minimal fine-tuning.
* **Simple and Flexible** โ๏ธ
Easy to understand and customize.
* **Colab Compatible** ๐งช
Train your model directly on Google Colab.### ASR Pipeline
* Preprocessing (`run_preprocess.py`)
* Find all audio files and transcriptions.
* Generate vocabularies (character/word/subword/code-switched).
* Training (`run_asr.py`)
* Dataset (`miniasr/data/dataset.py`)
* Tokenizer for text data (`miniasr/data/text.py`)
* DataLoader (`miniasr/data/dataloader.py`)
* Model (`miniasr/model/base_asr.py`)
* Feature extractor
* Data augmentation
* End-to-end CTC ASR
* Testing (`run_asr.py`)
* CTC greedy/beam decoding
* Performance measures: error rates, RTF, latency## Instructions
### Requirements
* Python 3.6+
* Install sox on your OS
* Install latest [s3prl](https://github.com/s3prl/s3prl) (at least `v0.4`)
```bash
git clone https://github.com/s3prl/s3prl.git
cd s3prl
pip install -e ./
cd ..
```
* Install via pip:
```bash
pip install -e ./
```
Additional libraries:* [flashlight](https://github.com/flashlight/flashlight): to decode with LM and beam search.
### Pre-trained ASR
You can directly use pre-trained ASR models for any applications. (under construction ๐ง)
```python
from miniasr.utils import load_from_checkpoint
from miniasr.data.audio import load_waveform# Option 1: Loading from a checkpoint
model, args, tokenizer = load_from_checkpoint('path/to/ckpt', 'cuda')
# Option 2: Loading from torch.hub (TODO)
model = torch.hub.load('vectominist/MiniASR', 'ctc_eng').to('cuda')# Load waveforms and recognize!
waves = [load_waveform('path/to/waveform').to('cuda')]
hyps = model.recognize(waves)
```### Preprocessing
* For already implemented corpora, please see `egs/`.
* To customize your own dataset, please see `miniasr/preprocess`.miniasr-preprocess
Options:
```
--corpus Corpus name.
--path Path to dataset.
--set Which subsets to be processed.
--out Output directory.
--gen-vocab Specify whether to generate vocabulary files.
--char-vocab-size Character vocabulary size.
--word-vocab-size Word vocabulary size.
--subword-vocab-size Subword vocabulary size.
--gen-subword Specify whether to generate subword vocabulary.
--subword-mode {unigram,bpe} Subword training mode.
--char-coverage Character coverage.
--seed SEED Set random seed.
--njobs Number of workers.
--log-file Logging file.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Logging level.
```### Training & Testing
See examples in `egs/`.miniasr-asr
Options:
```
--config Training configuration file (.yaml).
--test Specify testing mode.
--ckpt Checkpoint for testing.
--test-name Specify testing results' name.
--cpu Using CPU only.
--seed Set random seed.
--njobs Number of workers.
--log-file Logging file.
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Logging level.
```## TODO List
* `torch.hub` support
* Releasing pre-trained ASR models## Reference Papers
* [Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.cs.toronto.edu/~graves/icml_2006.pdf), Graves et al.
* [Neural Machine Translation of Rare Words with Subword Units](https://aclanthology.org/P16-1162/), Sennrich et al.
* [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447), Hsu et al.
* [SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779), Park et al.## Reference Repos
* [PyTorch](https://github.com/pytorch/pytorch)
* [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)
* [S3PRL](https://github.com/s3prl/s3prl)
* [Fairseq](https://github.com/pytorch/fairseq)
* [Flashlight](https://github.com/flashlight/flashlight)
* [SentencePiece](https://github.com/google/sentencepiece)
* [End-to-end-ASR-Pytorch](https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch)## Citation
```
@misc{chang2021miniasr,
title={{MiniASR}},
author={Chang, Heng-Jui},
year={2021},
url={https://github.com/vectominist/MiniASR}
}
```