https://github.com/sanket-poojary-03/fine-tuning-whisper
Fine tuning Whisper-Small LLM for Hinglish Audio dataset
https://github.com/sanket-poojary-03/fine-tuning-whisper
audio-dataset audio-to-text deep-learning fine-tuning huggingface-transformers python speech-recognition speech-to-text whisper whisper-ai
Last synced: 2 months ago
JSON representation
Fine tuning Whisper-Small LLM for Hinglish Audio dataset
- Host: GitHub
- URL: https://github.com/sanket-poojary-03/fine-tuning-whisper
- Owner: sanket-poojary-03
- Created: 2024-07-30T09:27:17.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-30T18:01:14.000Z (9 months ago)
- Last Synced: 2025-01-30T18:05:49.306Z (9 months ago)
- Topics: audio-dataset, audio-to-text, deep-learning, fine-tuning, huggingface-transformers, python, speech-recognition, speech-to-text, whisper, whisper-ai
- Language: Python
- Homepage:
- Size: 41 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fine tuning Open Source Whisper (Speech-to-Text) Model
For the **DARPG Hackathon 2024**, **The Problem Statement 3** involved evaluating and optimizing an Open Source speech-to-text model to accurately transcribe feedback calls related to citizen grievances into English text.
Since the textual output data was not provided, Whisper LLM was used to generate textual data for each audio dataset. This data was then stored in a `metadata.csv` file, and after preprocessing, it was used to fine-tune the Whisper small LLM.
Here’s a YouTube video explaining our project: [Watch our project explanation on YouTube](https://youtu.be/qPTS3mdLkAY?si=xgYwI-QeYI0aC2Km)
## Dataset Preparation
Prepare your Audio folder in the following format:
```
audio_dataset/
├── metadata.csv
└── data/
```
`metadata.csv` contains the names of the audio files `audio_path` and their corresponding texts `transcription`.
`data/` folder contains all the audio files.## Hackathon Workflow
These are all the steps we followed during the hackathon.
![]()
## Deployment
We have deployed this model on **Hugging Face Spaces** for easy access and usage. You can try it out here:
🤗 [WHISPER-SPEECH-TO-TEXT-MODEL-FOR-DARPG](https://huggingface.co/spaces/sanket003/WHISPER-SPEECH-TO-TEXT-MODEL-FOR-DARPG)
## Using the Model Locally:
To use the model, run the `run_model.py` script, which contains a Gradio interface for easy interaction with the model.