Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
https://github.com/sugarcane-mk/finetuning_wav2vec2
asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h
Last synced: 22 days ago
JSON representation
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
- Host: GitHub
- URL: https://github.com/sugarcane-mk/finetuning_wav2vec2
- Owner: sugarcane-mk
- Created: 2024-10-14T08:29:12.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-11-07T05:41:53.000Z (3 months ago)
- Last Synced: 2024-11-24T19:13:01.413Z (3 months ago)
- Topics: asr, asr-model, cuda, facebook, fairseq, fine-tuning, finetuning, huggingface, librosa, python, torch, transformers, wav2vec2, wav2vec2-large-960h
- Language: Jupyter Notebook
- Homepage:
- Size: 42 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fine-tuning Wav2Vec2 for Tamil Speech Recognition
This repository contains the Jupyter Notebook and resources for fine-tuning the Wav2Vec2 model for Tamil speech recognition using the Hugging Face Transformers library.
## Table of Contents
- [Introduction](#introduction)
- [Requirements](#requirements)
- [Dataset](#dataset)
- [Training](#training)
- [Inference](#inference)
- [Results](#results)
- [Acknowledgements](#acknowledgments)## Introduction
Wav2Vec2 is a state-of-the-art model for automatic speech recognition (ASR). This project aims to adapt Wav2Vec2 for the Tamil language, leveraging available datasets to improve performance in recognizing spoken Tamil.
## Requirements
To run this project, ensure you have the following installed:
- Python 3.7 or higher
- Jupyter Notebook
- PyTorch
- Transformers
- Datasets
- Librosa
- Soundfile
- [CUDA](https://developer.nvidia.com/cuda-downloads)You can install the required packages using the following command:
```bash
pip install -r requirements.txt
```## Dataset
We use Tamil Speech Dataset for fine-tuning the model. The dataset consists of audio files in Tamil along with their transcriptions. Please ensure you download the dataset and place it in an accessible directory.
Refer datapreprocessing.py## Training
To fine-tune the Wav2Vec2 model, open the [Jupyter Notebook](https://github.com/sugarcane-mk/finetuning_wav2vec2/blob/main/Finetune_wav2vec2_xlsr_tamil.ipynb) and follow the instructions provided within the notebook to execute the training process.## Inference
After training, you can perform inference using the code snippets provided in the Jupyter Notebook. Ensure to replace the paths with your specific audio files.## Results
The performance of the model can be evaluated using standard metrics such as Word Error Rate (WER). The notebook contains sections on evaluating the model's performance.
```bash
pip install jiwer```
```python
import jiweroriginal_transcript = "God is great" # Example script replace with your transcription
output_transcription = "good is great"# Compute WER
wer = jiwer.wer(reference, hypothesis)
print(f"Word Error Rate (WER): {wer:.2f}")```
## Acknowledgments
For further reference please visit: [Fairseq Wav2Vec2](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)