{"id":27114199,"url":"https://github.com/linagora-labs/asr_benchmark","last_synced_at":"2025-04-07T03:56:34.779Z","repository":{"id":285718195,"uuid":"790275912","full_name":"linagora-labs/asr_benchmark","owner":"linagora-labs","description":"Toolkit to benchmark various speech recognition APIs (NeMo, Whisper...) and visualize the results","archived":false,"fork":false,"pushed_at":"2025-04-02T09:14:10.000Z","size":673,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T10:25:13.505Z","etag":null,"topics":["asr","benchmark","nemo","speech-recognition","whisper"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linagora-labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-22T15:29:53.000Z","updated_at":"2025-04-02T09:14:14.000Z","dependencies_parsed_at":"2025-04-02T10:25:18.908Z","dependency_job_id":"6556a669-e72d-4efb-b35a-99fdaa2a0ad3","html_url":"https://github.com/linagora-labs/asr_benchmark","commit_stats":null,"previous_names":["linagora-labs/asr_benchmark_fr"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linagora-labs%2Fasr_benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linagora-labs%2Fasr_benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linagora-labs%2Fasr_benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linagora-labs%2Fasr_benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linagora-labs","download_url":"https://codeload.github.com/linagora-labs/asr_benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247589823,"owners_count":20963022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","benchmark","nemo","speech-recognition","whisper"],"created_at":"2025-04-07T03:56:34.114Z","updated_at":"2025-04-07T03:56:34.769Z","avatar_url":"https://github.com/linagora-labs.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ASR Benchmark\n\nToolkit to benchmark various speech recognition APIs (NeMo, Whisper...) and visualize the results. Supported models are mostly french. It can compute WER, RTF (or latencies when streaming) and measure hardware usage.\n\n## How to bench\n\nJust run:\n\n```\npython benchmarker.py CONFIG_FILE\n```\n\n### Data\n\nThe input data file (manifest) is a jsonl file (one json per line). Each line must have these fields:\n- \"audio_filepath\", the path to the audio file\n- \"text\", the text associated with the segment\n\nThey can also have:\n- \"offset\", the start of the segment in the audio (if not specified, it is equal to 0)\n- \"duration\", the duration of the segment (if not specified, the whole audio is used)\n- \"name\" or \"dataset\", the name of the dataset\n\n### Examples\n\nExamples are provided in the `examples` folder. There is a audio file to test with benchmark config file, and a notebook for generating plots.\n\n## Requirements\n\nYou need to install the package so that benchmarker.py can find the source code:\n```\npip install -e .\n```\nYou also need to install [ssak](https://github.com/linagora-labs/ssak) repo. You can run:\n```\npip install git+https://github.com/linagora-labs/ssak\n```\nThen depending on what you want to bench, you will need to install other packages like faster-whisper, nemo(nemo_toolkit['asr']), whisper and transformers.\n\n\n\n## Tools\n\nSome tools are avaialble in the `tools` folder:\n- add_silence.py: a script for adding white noise to audio files\n- subsample_data.py: for selecting a subset of specified datasets\n\nDon't hesitate to submit your tools (for converting datasets to the jsonl format for example). I used scripts from ssak to do it but datasets were in kaldi format.\n\n## Backends (interfaces)\n\nThe current available backends:\n- HTTP-API (\"http-api\")\n- LinTo-STT (\"linto-stt\"): for using whisper, kaldi or nemo models. Can be streaming (can compute latencies) or offline\n- Whisper (\"openai\")\n- Faster Whisper (\"faster-whisper\")\n- Transformers (\"transformers\"): work for Whisper\n- Transformers Intel (\"intel-transformers\"): for using intel extension\n- Transformers Facebook (\"transformers-facebook\"): for MMS model\n- Transformers Bofenghuang (\"transformers-bofenghuang\"): for the french finetuned wav2vec\n- NeMo (\"nemo\")\n\n\nIf the available interfaces don't allow to bench a model you want, you can easily add it by folliwing these steps:\n- You create new class that inherits from `asr_benchmark.benchmark.interfaces.Model`\n- You implement the various functions (load, transcribe, ...)\n- You add your backend in `asr_benchmark.benchmark.backend_to_model`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinagora-labs%2Fasr_benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinagora-labs%2Fasr_benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinagora-labs%2Fasr_benchmark/lists"}