Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/i4ds/whisper-finetune
This repository contains code for fine-tuning the Whisper speech-to-text model.
https://github.com/i4ds/whisper-finetune
fine-tuning nlp speech-to-text whisper
Last synced: 2 months ago
JSON representation
This repository contains code for fine-tuning the Whisper speech-to-text model.
- Host: GitHub
- URL: https://github.com/i4ds/whisper-finetune
- Owner: i4Ds
- License: mit
- Created: 2024-07-16T08:41:44.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-16T13:43:39.000Z (4 months ago)
- Last Synced: 2024-10-09T19:05:45.252Z (2 months ago)
- Topics: fine-tuning, nlp, speech-to-text, whisper
- Language: Jupyter Notebook
- Homepage:
- Size: 36.1 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper-Finetune
[![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![GitHub issues](https://img.shields.io/github/issues/i4ds/whisper-finetune.svg)](https://github.com/i4ds/whisper-finetune/issues)This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:
- Timestamp training
- Prompt training
- Stochastic depth implementation for improved model generalization
- Correct implementation of SpecAugment for robust audio data augmentation
- Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
- Integration with Weights & Biases (wandb) for experiment tracking and model versioning## Installation
1. Clone the repository:
```bash
git clone https://github.com/i4ds/whisper-finetune.git
cd whisper-finetune
```2. Create and activate a virtual environment (strongly recommended) with Python 3.9.* and a Rust compiler available.
3. Install the package in editable mode:
```bash
pip install -e .
```## Data
Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a [🤗 Datasets](https://huggingface.co/docs/datasets/en/index) to the script.## Usage
1. Create a configuration file (see examples in `configs/*.yaml`)
2. Run the fine-tuning script:
```bash
python src/whisper_finetune/scripts/finetune.py --config configs/large-cv-srg-sg-corpus.yaml
```## Deployment
We suggest to use [faster-whisper](https://github.com/SYSTRAN/faster-whisper). To convert your fine-tuned model, you can use the script located at `src/whisper_finetune/scripts/convert_c2t.py`.Further improvement of quality can be archieved by serving the requests with [whisperx](https://github.com/m-bain/whisperX).
## Configuration
Modify the YAML files in the `configs/` directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.
## Thank you
The starting point of this repository was the excellent repository by [Jumon](https://github.com/jumon) at https://github.com/jumon/whisper-finetuning
## Contributing
We welcome contributions! Please feel free to submit a Pull Request.
## Support
If you encounter any problems, please file an issue along with a detailed description.
## Maintainer
- Vincenzo Timmel ([email protected])
## Developers
- Vincenzo Timmel ([email protected])
- Claudio Paonessa ([email protected])## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.