{"id":17504366,"url":"https://github.com/revdotcom/reverb","last_synced_at":"2025-05-15T13:06:59.952Z","repository":{"id":258388053,"uuid":"858357620","full_name":"revdotcom/reverb","owner":"revdotcom","description":"Open source inference code for Rev's model","archived":false,"fork":false,"pushed_at":"2025-04-22T17:45:00.000Z","size":519,"stargazers_count":399,"open_issues_count":15,"forks_count":25,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-04-22T18:50:16.970Z","etag":null,"topics":["asr","asr-model","canary","deeplearning","diarization","docker","huggingface","neural-network","open-source","opensource","pyannote","rev","revai","speaker-diarization","speech-recognition","speech-to-text","speechrecognition","wenet","whisper"],"latest_commit_sha":null,"homepage":"https://www.rev.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/revdotcom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-16T18:52:00.000Z","updated_at":"2025-04-17T04:26:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"22aaedcf-c897-4340-85bc-0116d127f688","html_url":"https://github.com/revdotcom/reverb","commit_stats":{"total_commits":20,"total_committers":6,"mean_commits":"3.3333333333333335","dds":0.6,"last_synced_commit":"89b7d9153d487c3825636b2200f624a6b50df4af"},"previous_names":["revdotcom/reverb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revdotcom%2Freverb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revdotcom%2Freverb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revdotcom%2Freverb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/revdotcom%2Freverb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/revdotcom","download_url":"https://codeload.github.com/revdotcom/reverb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346624,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","asr-model","canary","deeplearning","diarization","docker","huggingface","neural-network","open-source","opensource","pyannote","rev","revai","speaker-diarization","speech-recognition","speech-to-text","speechrecognition","wenet","whisper"],"created_at":"2024-10-20T00:15:15.766Z","updated_at":"2025-05-15T13:06:59.934Z","avatar_url":"https://github.com/revdotcom.png","language":"Python","funding_links":[],"categories":["Open-source Large Speech/Audio Models\u003ca id=\"model\"\u003e\u003c/a\u003e"],"sub_categories":["Others\u003ca id=\"paper11\"\u003e\u003c/a\u003e"],"readme":"![Rev Logo](resources/logo_purple.png#gh-light-mode-only)\n![Rev Logo](resources/logo_white.png#gh-dark-mode-only)\n# Reverb\nOpen source inference and evaluation code for Rev's state-of-the-art speech recognition and diarization models. The speech recognition (ASR) code uses the [WeNet](https://github.com/wenet-e2e/wenet) framework and the speech diarization code uses the [Pyannote](https://github.com/pyannote/pyannote-audio) framework. More detailed model descriptions can be found in our [blog](https://www.rev.com/blog/speech-to-text-technology/introducing-reverb-open-source-asr-diarization) and the models can be downloaded from [huggingface](https://huggingface.co/Revai).\n\n## Table of Contents\n- [ASR](#asr)\n- [Diarization](#diarization)\n- [Installation](#installation)\n  - [Docker Image](#docker-image)\n- [Hosting the Model](#hosting-the-model)\n- [License](#license)\n\n### ASR\nSpeech-to-text code based on the WeNet framework. See [the ASR folder](https://github.com/revdotcom/reverb/tree/main/asr) for more details and usage instructions.\n\nLong-form speech recognition WER results:\n| Model            | Earnings21 | Earnings22 | Rev16 |\n|------------------|------------|------------|-------|\n| Reverb ASR   |       9.68 |      13.68 | 10.30 |\n| Whisper Large-v3 |      14.26 |      19.05 | 10.86 |\n| Canary-1B        |      14.40 |      19.01 | 13.82 |\n\n### Diarization\nSpeaker diarization code based on the Pyannote framework. See [the diarization folder](https://github.com/revdotcom/reverb/tree/main/diarization) for more details and usage instructions.\n\nLong-form WDER results, in combination with Rev's ASR:\n| Model            | Earnings21 |  Rev16 |\n|------------------|------------|-------|\n| Pyannote3.0  |    0.051    |   0.090   |\n| Reverb Diarization V1 |      0.047 |   0.077 |\n| Reverb Diarization V2 |      0.046 |   0.078 |\n\n# Getting Started \u003ca name=\"getting-started\"\u003e\u003c/a\u003e\n\u003e[!IMPORTANT]\n\u003eThese instructions require that you set up:\n\u003e * HuggingFace access token and have cli login.\n\u003e   * Click the following links for more information on [HuggingFace access tokens](https://huggingface.co/docs/hub/security-tokens#user-access-tokens) and setting up your [cli login](https://huggingface.co/docs/huggingface_hub/en/guides/cli#huggingface-cli-login).\n\u003e * Git LFS\n\u003e   * Simply run `git lfs install` from your terminal.\n\nCheck out the READMEs within each subdirectory for more information on the [ASR](asr/README.md) or [diarization](diarization/README.md) models.\n\n## Python Setup \u003ca name=\"python-setup\"\u003e\u003c/a\u003e\nThis codebase is compatible Python 3.10+. To get started, simply run\n```bash\npip install .\n```\nThis will install the `reverb` package into your python environment which is a modified version of the [wenet python package](https://github.com/wenet-e2e/wenet/tree/main?tab=readme-ov-file#install-python-package). In order to use `reverb`'s code, make sure you **do not** have another wenet installation in your environment which might cause conflict issues.\n\n\u003e [!TIP]\n\u003e While we suggest using our CLI or Python package to download the reverb model, you can also download it manually by running:\n\u003e ```bash\n\u003e git lfs install\n\u003e git clone https://huggingface.co/Revai/reverb-asr\n\u003e ```\n\n### Command Line Usage\nThe following command can be used to transcribe audio files:\n```bash\nreverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results\n```\nYou can also specify how \"verbatim\" the transcription should be:\n```bash\nreverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results --verbatimicity 0.2\n```\nEven change the decoding mode:\n```bash\nreverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results --modes ctc_prefix_beam_search\n```\nFor a full list of arguments, run:\n```bash\nreverb --help\n```\nor checkout our [script](asr/wenet/bin/recognize_wav.py).\n\n### Python Usage\nReverb can also be used from within Python:\n```python\nimport wenet\nreverb = wenet.load_model(\"reverb_asr_v1\")\noutput = reverb.transcribe(\"audio.mp3\")\nprint(output)\n```\nThe `load_model` function will automatically download the reverb model from HuggingFace.\nIf instead you have a local version of the model that you downloaded from our HuggingFace or that you've finetuned, you can simply specify the path to the directory containing the `.pt` checkpoint, `config.yaml`, and extra files in `load_model` to use your model.\n```python\nimport wenet\nreverb = wenet.load_model(\"/local/reverb-asr\")\noutput = reverb.transcribe(\"audio.mp3\")\nprint(output)\n```\nIf instead of text output, you'd prefer CTM output, simply specify the format in the `transcribe` command.\n```python\nimport wenet\nreverb = wenet.load_model(\"reverb_asr_v1\")\n# Specifying the \"format\" will change the output\noutput = reverb.transcribe(\"audio.mp3\", format=\"ctm\")\nprint(output)\n```\nAll arguments available to the `reverb` command line are also parameters that can be included in the `transcribe` command.\n```python\nimport wenet\nreverb = wenet.load_model(\"reverb_asr_v1\")\n# Specifying the \"format\" will change the output\noutput = reverb.transcribe(\"audio.mp3\", verbatimicity=0.5, beam_size=2, ctc_weight=0.6)\nprint(output)\n```\n\n### Docker Image\nAlternatively, you can use Docker to run ASR and/or diarization without needing to install dependencies (including the model files).\ndirectly on your system. First, make sure Docker is installed on your system. If you wish to run\non NVIDIA GPU, more steps might be required.\nThen, run the following command to build the Docker image:\n```bash\ndocker build -t reverb . --build-arg HUGGINGFACE_ACCESS_TOKEN=${YOUR_HUGGINGFACE_ACCESS_TOKEN}\n```\n\nAnd to run docker\n```bash\nsudo docker run --entrypoint \"/bin/bash\" --gpus all --rm -it reverb\n```\n\n# Hosting the Model\nIf your usecase requires a to deploy these models at a larger scale and maintaining strict\nsecurity requirements, consider using our other release: https://github.com/revdotcom/reverb-self-hosted.\nThis setup will give you full control over the deployment of our models on your own infrastructure\nwithout the need for internet connectivity or cloud dependencies.\n\n# License\nThe license in this repository applies *only to the code not the models*. See LICENSE for details. For model licenses, check out their pages on HuggingFace.\n\n# Citations\nIf you make use of this model, please cite this paper\n```\n@article{bhandari2024reverb,\n  title={Reverb: Open-Source ASR and Diarization from Rev},\n  author={Bhandari, Nishchal and Chen, Danny and del Río Fernández, Miguel Ángel and Delworth, Natalie and Fox, Jennifer Drexler and Jetté, Miguel and McNamara, Quinten and Miller, Corey and Novotný, Ondřej and Profant, Ján and Qin, Nan and Ratajczak, Martin and Robichaud, Jean-Philippe},\n  journal={arXiv preprint arXiv:2410.03930},\n  year={2024}\n}\n```\n\n# Contributors\nNishchal Bhandari, Danny Chen, Miguel Del Rio, Natalie Delworth, Jennifer Drexler Fox, Miguel Jette, Quinn McNamara, Corey Miller, Ondrej Novotny, Jan Profant, Nan Qin, Martin Ratajczak, and Jean-Philippe Robichaud.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frevdotcom%2Freverb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frevdotcom%2Freverb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frevdotcom%2Freverb/lists"}