https://github.com/lucadellalib/ts-asr
Target speaker automatic speech recognition (TS-ASR)
https://github.com/lucadellalib/ts-asr
asr conformer pytorch rnn speech-recognition speechbrain transducer
Last synced: 10 months ago
JSON representation
Target speaker automatic speech recognition (TS-ASR)
- Host: GitHub
- URL: https://github.com/lucadellalib/ts-asr
- Owner: lucadellalib
- Created: 2023-07-30T05:03:57.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-10-14T19:39:02.000Z (over 2 years ago)
- Last Synced: 2025-04-02T16:53:40.507Z (about 1 year ago)
- Topics: asr, conformer, pytorch, rnn, speech-recognition, speechbrain, transducer
- Language: Python
- Homepage:
- Size: 301 MB
- Stars: 11
- Watchers: 2
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Target Speaker Automatic Speech Recognition
[](https://www.python.org/downloads/)
This [SpeechBrain](https://speechbrain.github.io) recipe includes scripts to train end-to-end transducer-based target speaker automatic
speech recognition (TS-ASR) systems as proposed in [Streaming Target-Speaker ASR with Neural Transducer](https://arxiv.org/abs/2209.04175).
---------------------------------------------------------------------------------------------------------
## ⚡ Datasets
### LibriSpeechMix
Generate the LibriSpeechMix data in `` following the
[official readme](https://github.com/NaoyukiKanda/LibriSpeechMix/blob/main/README.md).
---------------------------------------------------------------------------------------------------------
## 🛠️️ Installation
Clone the repository, navigate to ``, open a terminal and run:
```bash
pip install -e vendor/speechbrain
pip install -r requirements.txt
```
---------------------------------------------------------------------------------------------------------
## ▶️ Quickstart
Navigate to ``, open a terminal and run:
```bash
python train__.py hparams//.yaml --data_folder
```
To use multiple GPUs on the same node, run:
```bash
python -m torch.distributed.launch --nproc_per_node= \
train__.py hparams//.yaml --data_folder --distributed_launch
```
To use multiple GPUs on multiple nodes, for each node with rank `0, ..., - 1` run:
```bash
python -m torch.distributed.launch --nproc_per_node= \
--nnodes= --node_rank= --master_addr --master_port 5555 \
train__.py hparams//.yaml --data_folder --distributed_launch
```
Helper functions and scripts for plotting and analyzing the results can be found in `utils.py` and `tools`.
**NOTE**: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training,
gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).
### Examples
```bash
nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &
```
---------------------------------------------------------------------------------------------------------
## 📧 Contact
[luca.dellalib@gmail.com](mailto:luca.dellalib@gmail.com)
---------------------------------------------------------------------------------------------------------