https://github.com/shuyib/zindi_mcv_swahilli

How I used Seamless m4t large to get to the top 5 of the mozilla common voice competition hosted on Zindi
https://github.com/shuyib/zindi_mcv_swahilli

asr-model hackathon mozilla-common-voice seamlessm4t stt swahili voice-recognition zindi-hackathon

Last synced: 2 months ago
JSON representation

How I used Seamless m4t large to get to the top 5 of the mozilla common voice competition hosted on Zindi

Host: GitHub
URL: https://github.com/shuyib/zindi_mcv_swahilli
Owner: Shuyib
License: apache-2.0
Created: 2023-12-12T06:55:43.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-27T10:56:33.000Z (over 1 year ago)
Last Synced: 2025-01-27T09:41:24.756Z (4 months ago)
Topics: asr-model, hackathon, mozilla-common-voice, seamlessm4t, stt, swahili, voice-recognition, zindi-hackathon
Language: Python
Homepage:
Size: 14.6 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # zindi_mcv_swahilli

public word error rate: **0.114294524**   

private word error rate: **0.112809661**   

How I used Seamless m4t large to get to the top 5 of the [mozilla common voice competition](https://zindi.africa/competitions/mozilla-foundation-mozilla-common-voice-hackathon-i-nairobi). I only downloaded the `test.tar.gz` directory later I unzipped it and resampled all the audio to 16KHz. I noticed that there was some audio that was muffled, and was pretty bad as is due to the sampling rates that were set. Anyways, the script I used to do the conversion is called `prepare_files.sh`. Follow the instructions to install [seamless m4t large](https://github.com/facebookresearch/seamless_communication). I performed inference on each audio file `python asr.py` the output was then saved to **asr_results.csv** then it was formatted to a certain format needed for Zindi with `python clean_submission.py`. 

## You can do all this in one step

```bash

make run

```

## Lesson

Review huggingface leaderboard for the ASR models. Look for one with the fastest and the most accurate. 

[leaderboard](https://huggingface.co/models?other=hf-asr-leaderboard)

Facebook/meta have a lot of Speech to text models. Look for one that is capable of doing Speech to text. The ones that primarily do one thing seem to be the best.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shuyib/zindi_mcv_swahilli

Awesome Lists containing this project

README