https://github.com/cyberagentailab/mbr-for-asr
Code for Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
https://github.com/cyberagentailab/mbr-for-asr
automatic-speech-recognition
Last synced: 8 months ago
JSON representation
Code for Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
- Host: GitHub
- URL: https://github.com/cyberagentailab/mbr-for-asr
- Owner: CyberAgentAILab
- License: mit
- Created: 2025-10-21T08:44:53.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-23T02:14:59.000Z (8 months ago)
- Last Synced: 2025-10-23T04:12:29.254Z (8 months ago)
- Topics: automatic-speech-recognition
- Language: Python
- Homepage: https://arxiv.org/abs/2510.19471
- Size: 264 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Minimum Bayes Risk Decoding for Automated Speech Recognition
This repository contains the experiment code for [Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition](https://arxiv.org/abs/2510.19471).

### Setup
Our codebase is developed and tested on Ubuntu 22.04.
It is not rigorously tested on other platforms.
A Dockerfile is provided for reproducing the code.
The following procedure builds a Docker image used for the experiment.
```
git clone git@github.com:CyberAgentAILab/mbr-for-asr.git
cd mbr-for-asr
docker build . -t mbrasr:latest
```
The codebase can likely run on a native macOS environment by installing dependencies directly with astral-uv instead of using Docker, though this hasn't been rigorously tested. We officially support Ubuntu. PRs to improve macOS compatibility are welcome.
### Experiment
Then, the experiments can be conducted inside the Docker container.
You need to set an environment variable HF_READ_TOKEN to your [huggingface's token](https://huggingface.co/docs/hub/en/security-tokens) to run the code.
```
docker run -it -e HF_READ_TOKEN=${YOUR HUGGINGFACES TOKEN} mbrasr:latest
```
Inside the docker image, one can run experiments using the scripts in experiments/ directory.
The following command (inside the Docker image) generates samples for MBR decoding.
```
./experiments/sample.sh -d {DOMAIN} -m {MODEL} -s {NSAMPLES}
```
By default, it runs on LibriSpeech domain using [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) with 4 samples so that it runs swiftly on CPU. For larger models, we recommend using a GPU.
The generated samples are stored in samples/ directory.
Then, the following command computes the MBR decoding using the sampled outputs.
```
./experiments/run_mbr.sh -d {DOMAIN} -m {MODEL} -s {NSAMPLES} -v {EVALUATION_METRIC}
```
This code runs the evaluation at the same time.
This codebase supports various evaluation metrics, including WER, CER, BLEU, ROUGE, and METEOR.
You can also add your own metric by following the interface defined in [mbr/utility/utility_class.py](mbr/utility/utility_class.py).
The result of the evaluation is stored in results/ directory.
### Demo
We have a Gradio app to compare beam search and MBR decoding in demo/ directory.
By default, it transcribes English speech in the audio. You can change the task, language, and the ASR model by editting the code [demo/app.py](demo/app.py).
It would be useful to qualitatively evaluate the two decoding algorithms.
To run the app, execute the following command.
```
cd demo
pip install -r requirements.txt
python3 app.py
```
### LICENSE
The codebase is [MIT License](LICENSE), except for the implementation of MetricX in [mbr/utility/metricx.py](mbr/utility/metricx.py), which is owned by Google and distributed under Apache 2.0 license.
### Reference
[Yuu Jinnai. 2025. Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition. arXiv preprint arxiv:2510.19471.](https://arxiv.org/abs/2510.19471)
### Contact
For any questions, feel free to raise an issue or contact me at jinnai_yu@cyberagent.co.jp.