An open API service indexing awesome lists of open source software.

https://github.com/lucadellalib/ts-asr

Target speaker automatic speech recognition (TS-ASR)
https://github.com/lucadellalib/ts-asr

asr conformer pytorch rnn speech-recognition speechbrain transducer

Last synced: 5 months ago
JSON representation

Target speaker automatic speech recognition (TS-ASR)

Awesome Lists containing this project

README

        

# Target Speaker Automatic Speech Recognition

[![Python version: 3.6 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11](https://img.shields.io/badge/python-3.6%20|%203.7%20|%203.8%20|%203.9%20|%203.10%20|%203.11-blue)](https://www.python.org/downloads/)

This [SpeechBrain](https://speechbrain.github.io) recipe includes scripts to train end-to-end transducer-based target speaker automatic
speech recognition (TS-ASR) systems as proposed in [Streaming Target-Speaker ASR with Neural Transducer](https://arxiv.org/abs/2209.04175).

---------------------------------------------------------------------------------------------------------

## ⚡ Datasets

### LibriSpeechMix

Generate the LibriSpeechMix data in `` following the
[official readme](https://github.com/NaoyukiKanda/LibriSpeechMix/blob/main/README.md).

---------------------------------------------------------------------------------------------------------

## 🛠️️ Installation

Clone the repository, navigate to ``, open a terminal and run:

```bash
pip install -e vendor/speechbrain
pip install -r requirements.txt
```

---------------------------------------------------------------------------------------------------------

## ▶️ Quickstart

Navigate to ``, open a terminal and run:

```bash
python train__.py hparams//.yaml --data_folder
```

To use multiple GPUs on the same node, run:

```bash
python -m torch.distributed.launch --nproc_per_node= \
train__.py hparams//.yaml --data_folder --distributed_launch
```

To use multiple GPUs on multiple nodes, for each node with rank `0, ..., - 1` run:

```bash
python -m torch.distributed.launch --nproc_per_node= \
--nnodes= --node_rank= --master_addr --master_port 5555 \
train__.py hparams//.yaml --data_folder --distributed_launch
```

Helper functions and scripts for plotting and analyzing the results can be found in `utils.py` and `tools`.

**NOTE**: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training,
gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).

### Examples

```bash
nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &
```

---------------------------------------------------------------------------------------------------------

## 📧 Contact

[[email protected]](mailto:[email protected])

---------------------------------------------------------------------------------------------------------