{"id":18753539,"url":"https://github.com/lucadellalib/ts-asr","last_synced_at":"2025-09-01T12:31:34.059Z","repository":{"id":192335489,"uuid":"672440364","full_name":"lucadellalib/ts-asr","owner":"lucadellalib","description":"Target speaker automatic speech recognition (TS-ASR)","archived":false,"fork":false,"pushed_at":"2023-10-14T19:39:02.000Z","size":315581,"stargazers_count":11,"open_issues_count":2,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-02T16:53:40.507Z","etag":null,"topics":["asr","conformer","pytorch","rnn","speech-recognition","speechbrain","transducer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucadellalib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-30T05:03:57.000Z","updated_at":"2024-12-04T06:10:47.000Z","dependencies_parsed_at":"2023-09-04T09:21:09.369Z","dependency_job_id":"3a4cce01-2d04-4bf7-8492-fa9283615937","html_url":"https://github.com/lucadellalib/ts-asr","commit_stats":null,"previous_names":["lucadellalib/ts-asr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lucadellalib/ts-asr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fts-asr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fts-asr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fts-asr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fts-asr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucadellalib","download_url":"https://codeload.github.com/lucadellalib/ts-asr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fts-asr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273122203,"owners_count":25049540,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","conformer","pytorch","rnn","speech-recognition","speechbrain","transducer"],"created_at":"2024-11-07T17:26:10.463Z","updated_at":"2025-09-01T12:31:30.938Z","avatar_url":"https://github.com/lucadellalib.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Target Speaker Automatic Speech Recognition\n\n[![Python version: 3.6 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11](https://img.shields.io/badge/python-3.6%20|%203.7%20|%203.8%20|%203.9%20|%203.10%20|%203.11-blue)](https://www.python.org/downloads/)\n\nThis [SpeechBrain](https://speechbrain.github.io) recipe includes scripts to train end-to-end transducer-based target speaker automatic\nspeech recognition (TS-ASR) systems as proposed in [Streaming Target-Speaker ASR with Neural Transducer](https://arxiv.org/abs/2209.04175).\n\n---------------------------------------------------------------------------------------------------------\n\n## ⚡ Datasets\n\n### LibriSpeechMix\n\nGenerate the LibriSpeechMix data in `\u003cpath-to-data-folder\u003e` following the\n[official readme](https://github.com/NaoyukiKanda/LibriSpeechMix/blob/main/README.md).\n\n---------------------------------------------------------------------------------------------------------\n\n## 🛠️️ Installation\n\nClone the repository, navigate to `\u003cpath-to-repository\u003e`, open a terminal and run:\n\n```bash\npip install -e vendor/speechbrain\npip install -r requirements.txt\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## ▶️ Quickstart\n\nNavigate to `\u003cpath-to-repository\u003e`, open a terminal and run:\n\n```bash\npython train_\u003cdataset\u003e_\u003cvariant\u003e.py hparams/\u003cdataset\u003e/\u003cconfig\u003e.yaml --data_folder \u003cpath-to-data-folder\u003e\n```\n\nTo use multiple GPUs on the same node, run:\n\n```bash\npython -m torch.distributed.launch --nproc_per_node=\u003cnum-gpus\u003e \\\ntrain_\u003cdataset\u003e_\u003cvariant\u003e.py hparams/\u003cdataset\u003e/\u003cconfig\u003e.yaml --data_folder \u003cpath-to-data-folder\u003e --distributed_launch\n```\n\nTo use multiple GPUs on multiple nodes, for each node with rank `0, ..., \u003cnum-nodes\u003e - 1` run:\n\n```bash\npython -m torch.distributed.launch --nproc_per_node=\u003cnum-gpus-per-node\u003e \\\n--nnodes=\u003cnum-nodes\u003e --node_rank=\u003cnode-rank\u003e --master_addr \u003crank-0-ip-addr\u003e --master_port 5555 \\\ntrain_\u003cdataset\u003e_\u003cvariant\u003e.py hparams/\u003cdataset\u003e/\u003cconfig\u003e.yaml --data_folder \u003cpath-to-data-folder\u003e --distributed_launch\n```\n\nHelper functions and scripts for plotting and analyzing the results can be found in `utils.py` and `tools`.\n\n**NOTE**: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training,\ngradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).\n\n### Examples\n\n```bash\nnohup python -m torch.distributed.launch --nproc_per_node=8 \\\ntrain_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \\\n--data_folder datasets/LibriSpeechMix --num_epochs 100 \\\n--distributed_launch \u0026\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## 📧 Contact\n\n[luca.dellalib@gmail.com](mailto:luca.dellalib@gmail.com)\n\n---------------------------------------------------------------------------------------------------------\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucadellalib%2Fts-asr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucadellalib%2Fts-asr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucadellalib%2Fts-asr/lists"}