Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sugarcane-mk/finetuning_wav2vec2

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
https://github.com/sugarcane-mk/finetuning_wav2vec2

asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h

Last synced: 22 days ago
JSON representation

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

Awesome Lists containing this project

README

        

# Fine-tuning Wav2Vec2 for Tamil Speech Recognition

This repository contains the Jupyter Notebook and resources for fine-tuning the Wav2Vec2 model for Tamil speech recognition using the Hugging Face Transformers library.

## Table of Contents

- [Introduction](#introduction)
- [Requirements](#requirements)
- [Dataset](#dataset)
- [Training](#training)
- [Inference](#inference)
- [Results](#results)
- [Acknowledgements](#acknowledgments)

## Introduction

Wav2Vec2 is a state-of-the-art model for automatic speech recognition (ASR). This project aims to adapt Wav2Vec2 for the Tamil language, leveraging available datasets to improve performance in recognizing spoken Tamil.

## Requirements

To run this project, ensure you have the following installed:

- Python 3.7 or higher
- Jupyter Notebook
- PyTorch
- Transformers
- Datasets
- Librosa
- Soundfile
- [CUDA](https://developer.nvidia.com/cuda-downloads)

You can install the required packages using the following command:
```bash
pip install -r requirements.txt
```

## Dataset
We use Tamil Speech Dataset for fine-tuning the model. The dataset consists of audio files in Tamil along with their transcriptions. Please ensure you download the dataset and place it in an accessible directory.
Refer datapreprocessing.py

## Training
To fine-tune the Wav2Vec2 model, open the [Jupyter Notebook](https://github.com/sugarcane-mk/finetuning_wav2vec2/blob/main/Finetune_wav2vec2_xlsr_tamil.ipynb) and follow the instructions provided within the notebook to execute the training process.

## Inference
After training, you can perform inference using the code snippets provided in the Jupyter Notebook. Ensure to replace the paths with your specific audio files.

## Results
The performance of the model can be evaluated using standard metrics such as Word Error Rate (WER). The notebook contains sections on evaluating the model's performance.
```bash
pip install jiwer

```
```python
import jiwer

original_transcript = "God is great" # Example script replace with your transcription
output_transcription = "good is great"

# Compute WER
wer = jiwer.wer(reference, hypothesis)
print(f"Word Error Rate (WER): {wer:.2f}")

```
## Acknowledgments
For further reference please visit: [Fairseq Wav2Vec2](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)