Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kurianbenoy/malayalam_asr_benchmarking

A study to benchmark whisper based ASRs in Malayalam
https://github.com/kurianbenoy/malayalam_asr_benchmarking

asr benchmarking speech transformers-library whisper

Last synced: 2 months ago
JSON representation

A study to benchmark whisper based ASRs in Malayalam

Host: GitHub
URL: https://github.com/kurianbenoy/malayalam_asr_benchmarking
Owner: kurianbenoy
License: mit
Created: 2023-03-04T08:32:33.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-03-19T11:26:35.000Z (9 months ago)
Last Synced: 2024-03-19T12:30:23.737Z (9 months ago)
Topics: asr, benchmarking, speech, transformers-library, whisper
Language: Jupyter Notebook
Homepage: https://kurianbenoy.github.io/malayalam_asr_benchmarking/
Size: 928 KB
Stars: 8
Watchers: 3
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

        # malayalam_asr_benchmarking

## Objective of the project

> [!NOTE]

>

> A study to benchmark ASRs in Malayalam. Till now the project has

> benchmark based on Malayalam ASR models based in Whisper ASR and

> faster-whisper ASR.

## Benchmarked Datasets

Till now we have mainly benchmarked on two datasets:

1.  Common Voice 11 Dataset

I have now done benchmarking on Mozilla’s [Common Voice 11 Malayalam

subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/viewer/ml/train).

The benchmarking results can be found in [the below

dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_common_voice_benchmarking).

2.  Malayalam Speech Corpus

I have now benchmarked on SMC’s [Malayalam Speech corpus

dataset](https://msc.smc.org.in/). The benchmarking results can be found

in [the below

dataset](https://huggingface.co/datasets/kurianbenoy/malayalam_msc_benchmarking/tree/main).

## Install

``` sh

pip install malayalam_asr_benchmarking

```

or from github repository

``` sh

# Ensure git is installed, else install it. Eg: In ubuntu via apt install git

pip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git

```

Or locally

``` sh

# Ensure git is installed, else install it. Eg: In ubuntu via apt install git

git clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git

cd malayalam_asr_benchmarking

pip install -e .

```

## Setting up your development environment

I am developing this project with nbdev. Please take some time reading

up on nbdev … how it works,

[directives](https://nbdev.fast.ai/explanations/directives.html), etc…

by checking out [the

walk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and

[tutorials](https://nbdev.fast.ai/tutorials/) on the [nbdev

website](https://nbdev.fast.ai/)

### Step 1: Install Quarto:

`nbdev_install_quarto`

[Other options are mentioned in getting started to

quarto](https://quarto.org/docs/get-started/)

## Step 2: Install hooks

`nbdev_install_hooks`

## Step 3: Install our library

`pip install -e '.[dev]'`

## How to use

#### Evaluate Whisper-based Malayalam ASR models

``` python

from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice

werlist = []

cerlist = []

modelsizelist = []

timelist = []

evaluate_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)

```

``` python

from malayalam_asr_benchmarking.msc import evaluate_whisper_model_msc

werlist = []

cerlist = []

modelsizelist = []

timelist = []

evaluate_whisper_model_msc("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)

```

#### Evaluate faster-whisper based models

``` python

from malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice

werlist = []

cerlist = []

modelsizelist = []

timelist = []

evaluate_faster_whisper_model_common_voice("kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist)

```

``` python

from malayalam_asr_benchmarking.msc import evaluate_faster_whisper_model_msc

werlist = []

cerlist = []

modelsizelist = []

timelist = []

evaluate_faster_whisper_model_msc("kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist)

```