https://github.com/lucadellalib/ts-asr
Target speaker automatic speech recognition (TS-ASR)
https://github.com/lucadellalib/ts-asr
asr conformer pytorch rnn speech-recognition speechbrain transducer
Last synced: 5 months ago
JSON representation
Target speaker automatic speech recognition (TS-ASR)
- Host: GitHub
- URL: https://github.com/lucadellalib/ts-asr
- Owner: lucadellalib
- Created: 2023-07-30T05:03:57.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2023-10-14T19:39:02.000Z (over 1 year ago)
- Last Synced: 2024-11-07T17:32:23.032Z (7 months ago)
- Topics: asr, conformer, pytorch, rnn, speech-recognition, speechbrain, transducer
- Language: Python
- Homepage:
- Size: 301 MB
- Stars: 9
- Watchers: 2
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Target Speaker Automatic Speech Recognition
[](https://www.python.org/downloads/)
This [SpeechBrain](https://speechbrain.github.io) recipe includes scripts to train end-to-end transducer-based target speaker automatic
speech recognition (TS-ASR) systems as proposed in [Streaming Target-Speaker ASR with Neural Transducer](https://arxiv.org/abs/2209.04175).---------------------------------------------------------------------------------------------------------
## ⚡ Datasets
### LibriSpeechMix
Generate the LibriSpeechMix data in `` following the
[official readme](https://github.com/NaoyukiKanda/LibriSpeechMix/blob/main/README.md).---------------------------------------------------------------------------------------------------------
## 🛠️️ Installation
Clone the repository, navigate to ``, open a terminal and run:
```bash
pip install -e vendor/speechbrain
pip install -r requirements.txt
```---------------------------------------------------------------------------------------------------------
## ▶️ Quickstart
Navigate to ``, open a terminal and run:
```bash
python train__.py hparams//.yaml --data_folder
```To use multiple GPUs on the same node, run:
```bash
python -m torch.distributed.launch --nproc_per_node= \
train__.py hparams//.yaml --data_folder --distributed_launch
```To use multiple GPUs on multiple nodes, for each node with rank `0, ..., - 1` run:
```bash
python -m torch.distributed.launch --nproc_per_node= \
--nnodes= --node_rank= --master_addr --master_port 5555 \
train__.py hparams//.yaml --data_folder --distributed_launch
```Helper functions and scripts for plotting and analyzing the results can be found in `utils.py` and `tools`.
**NOTE**: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training,
gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).### Examples
```bash
nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &
```---------------------------------------------------------------------------------------------------------
## 📧 Contact
[[email protected]](mailto:[email protected])
---------------------------------------------------------------------------------------------------------