Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kurianbenoy/malayalam_asr_benchmarking
A study to benchmark whisper based ASRs in Malayalam
https://github.com/kurianbenoy/malayalam_asr_benchmarking
asr benchmarking speech transformers-library whisper
Last synced: 2 months ago
JSON representation
A study to benchmark whisper based ASRs in Malayalam
- Host: GitHub
- URL: https://github.com/kurianbenoy/malayalam_asr_benchmarking
- Owner: kurianbenoy
- License: mit
- Created: 2023-03-04T08:32:33.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-19T11:26:35.000Z (9 months ago)
- Last Synced: 2024-03-19T12:30:23.737Z (9 months ago)
- Topics: asr, benchmarking, speech, transformers-library, whisper
- Language: Jupyter Notebook
- Homepage: https://kurianbenoy.github.io/malayalam_asr_benchmarking/
- Size: 928 KB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# malayalam_asr_benchmarking
## Objective of the project
> [!NOTE]
>
> A study to benchmark ASRs in Malayalam. Till now the project has
> benchmark based on Malayalam ASR models based in Whisper ASR and
> faster-whisper ASR.## Benchmarked Datasets
Till now we have mainly benchmarked on two datasets:
1. Common Voice 11 Dataset
I have now done benchmarking on Mozilla’s [Common Voice 11 Malayalam
subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ml/train).
The benchmarking results can be found in [the below
dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_common_voice_benchmarking).2. Malayalam Speech Corpus
I have now benchmarked on SMC’s [Malayalam Speech corpus
dataset](https://msc.smc.org.in/). The benchmarking results can be found
in [the below
dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_msc_benchmarking/tree/main).## Install
``` sh
pip install malayalam_asr_benchmarking
```or from github repository
``` sh
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
pip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
```Or locally
``` sh
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
git clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
cd malayalam_asr_benchmarking
pip install -e .
```## Setting up your development environment
I am developing this project with nbdev. Please take some time reading
up on nbdev … how it works,
[directives](https://nbdev.fast.ai/explanations/directives.html), etc…
by checking out [the
walk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and
[tutorials](https://nbdev.fast.ai/tutorials/) on the [nbdev
website](https://nbdev.fast.ai/)### Step 1: Install Quarto:
`nbdev_install_quarto`
[Other options are mentioned in getting started to
quarto](https://quarto.org/docs/get-started/)## Step 2: Install hooks
`nbdev_install_hooks`
## Step 3: Install our library
`pip install -e '.[dev]'`
## How to use
#### Evaluate Whisper-based Malayalam ASR models
``` python
from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voicewerlist = []
cerlist = []
modelsizelist = []
timelist = []evaluate_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
`````` python
from malayalam_asr_benchmarking.msc import evaluate_whisper_model_mscwerlist = []
cerlist = []
modelsizelist = []
timelist = []evaluate_whisper_model_msc("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
```#### Evaluate faster-whisper based models
``` python
from malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voicewerlist = []
cerlist = []
modelsizelist = []
timelist = []evaluate_faster_whisper_model_common_voice("kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist)
`````` python
from malayalam_asr_benchmarking.msc import evaluate_faster_whisper_model_mscwerlist = []
cerlist = []
modelsizelist = []
timelist = []evaluate_faster_whisper_model_msc("kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist)
```