Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/luluw8071/automatic-speech-recognition-with-pytorch
End-to-End Automatic Speech Recognition on PyTorch with CTC Decoder and Ken LM
https://github.com/luluw8071/automatic-speech-recognition-with-pytorch
asr-model cnn-lstm-models ctc-decode cuda-support deep-neural-networks kenlm python pytorch pytorch-lightning
Last synced: 2 days ago
JSON representation
End-to-End Automatic Speech Recognition on PyTorch with CTC Decoder and Ken LM
- Host: GitHub
- URL: https://github.com/luluw8071/automatic-speech-recognition-with-pytorch
- Owner: LuluW8071
- License: gpl-3.0
- Created: 2023-07-30T16:18:56.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-21T15:25:31.000Z (24 days ago)
- Last Synced: 2024-10-22T02:15:28.038Z (23 days ago)
- Topics: asr-model, cnn-lstm-models, ctc-decode, cuda-support, deep-neural-networks, kenlm, python, pytorch, pytorch-lightning
- Language: Python
- Homepage:
- Size: 155 KB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# End-to-End Automatic Speech Recognition
![Code in Progress](https://img.shields.io/badge/status-completed-green.svg) ![License](https://img.shields.io/github/license/LuluW8071/Automatic-Speech-Recognition-with-PyTorch) ![Open Issues](https://img.shields.io/github/issues/LuluW8071/Automatic-Speech-Recognition-with-PyTorch) ![Closed Issues](https://img.shields.io/github/issues-closed/LuluW8071/Automatic-Speech-Recognition-with-PyTorch) ![Open PRs](https://img.shields.io/github/issues-pr/LuluW8071/Automatic-Speech-Recognition-with-PyTorch) ![Repo Size](https://img.shields.io/github/repo-size/LuluW8071/Deep-Speech-2) ![Last Commit](https://img.shields.io/github/last-commit/LuluW8071/Automatic-Speech-Recognition-with-PyTorch)
This project implements a small scale speech recognition system utilizing a Residual Convolutional Neural Network (CNN) - BiGRU Acoustic Model, a Connectionist Temporal Classification (CTC) Decoder, and a KENLM Language Model for enhanced accuracy.
## Model Architecture
## Installation
1. Clone the repository:
```bash
git clone --recursive https://github.com/LuluW8071/Automatic-Speech-Recognition-with-PyTorch.git
```2. Install **[Pytorch](https://pytorch.org/)** and required dependencies under virtual environment:
```bash
pip install -r requirements.txt
```Ensure you have `PyTorch` and `Lightning AI` installed.
## Train Model
>[!IMPORTANT]
> Before training make sure you have placed __comet ml api key__ and __project name__ in the environment variable file `.env`.```bash
py train.py
```Customize the pytorch training parameters by passing arguments in `train.py` to suit your needs:
Refer to the provided table to change hyperparameters and train configurations.
| Args | Description | Default Value |
|------------------------|-----------------------------------------------------------------------|--------------------|
| `-g, --gpus` | Number of GPUs per node | 1 |
| `-g, --num_workers` | Number of CPU workers | 8 |
| `-db, --dist_backend` | Distributed backend to use for training | ddp_find_unused_parameters_true |
| `--epochs` | Number of total epochs to run | 50 |
| `--batch_size` | Size of the batch | 32 |
| `-lr, --learning_rate` | Learning rate | 1e-5 (0.00001) |
| `--checkpoint_path` | Checkpoint path to resume training from | None |
| `--precision` | Precision of the training | 16-mixed |```bash
py train.py
-g 4 # Number of GPUs per node for parallel gpu training
-w 8 # Number of CPU workers for parallel data loading
--epochs 10 # Number of total epochs to run
--batch_size 64 # Size of the batch
-lr 2e-5 # Learning rate
--precision 16-mixed # Precision of the training
```>[!NOTE]
>To __resume training__ from a saved checkpoint, use:```bash
py train.py --checkpoint_path path_to_checkpoint.ckpt
```## Additional Resources
For pre-trained models and other resources, refer to the provided links.
[Click here to download pre trained model](https://mega.nz/folder/Lnxj3YCJ#Na6Nc1m4nz6jiSWTatfKJQ)---
This comprehensive guide should help you navigate through setting up and using the Speech Recognition system effectively. If you encounter any issues or have questions, feel free to reach out!