https://github.com/i4ds/whisper-finetune
This repository contains code for fine-tuning the Whisper speech-to-text model.
https://github.com/i4ds/whisper-finetune
fine-tuning nlp speech-to-text whisper
Last synced: 8 months ago
JSON representation
This repository contains code for fine-tuning the Whisper speech-to-text model.
- Host: GitHub
- URL: https://github.com/i4ds/whisper-finetune
- Owner: i4Ds
- License: mit
- Created: 2024-07-16T08:41:44.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-20T09:52:40.000Z (over 1 year ago)
- Last Synced: 2025-01-30T18:06:16.114Z (over 1 year ago)
- Topics: fine-tuning, nlp, speech-to-text, whisper
- Language: Jupyter Notebook
- Homepage:
- Size: 36.1 MB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper-Finetune
[](https://opensource.org/licenses/MIT)
[](https://github.com/i4ds/whisper-finetune/issues)
This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:
- **Multi-Dataset Validation** 🆕 - Evaluate on multiple validation sets simultaneously with macro averaging
- **Comprehensive Metrics** 🆕 - WER, CER, NLL, log-probability, entropy, and calibration (ECE)
- **Production-Ready Tests** 🆕 - Fast unit tests with pytest
- Timestamp training
- Prompt training
- Stochastic depth implementation for improved model generalization
- Correct implementation of SpecAugment for robust audio data augmentation
- Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
- Integration with Weights & Biases (wandb) for experiment tracking and model versioning
## What's New
### Multi-Dataset Validation System
Evaluate your model on multiple validation datasets (e.g., clean speech, noisy environments, different microphones) with comprehensive metrics beyond WER:
- **6 metrics per dataset**: WER, CER, NLL, log-prob, entropy, ECE
- **Macro averaging**: Unweighted mean across datasets (each dataset contributes equally)
- **Per-utterance tracking**: Detailed metrics for in-depth analysis
- **Smart checkpointing**: All models saved locally, manual W&B upload to avoid clutter
## Installation
1. Clone the repository:
```bash
git clone https://github.com/i4ds/whisper-finetune.git
cd whisper-finetune
```
2. Create and activate a virtual environment (strongly recommended) with Python 3.11 or higher.
3. Install the package in editable mode:
```bash
pip install -e .
```
Or using UV (very strongly recommended):
```bash
uv pip install -e .
```
## Data
Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a [🤗 Datasets](https://huggingface.co/docs/datasets/en/index) to the script.
## Usage
1. Create a configuration file (see `configs/example_config.yaml` for a fully documented example)
2. Run the fine-tuning script:
```bash
python src/whisper_finetune/scripts/finetune.py --config configs/example_config.yaml
```
## Testing
Run the test suite to ensure everything is working:
```bash
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with verbose output and coverage
pytest -v --cov=whisper_finetune
```
See [`tests/README.md`](tests/README.md) for more details.
## Deployment
We suggest to use [faster-whisper](https://github.com/SYSTRAN/faster-whisper). To convert your fine-tuned model, you can use the script located at `src/whisper_finetune/scripts/convert_c2t.py`.
Further improvement of quality can be archieved by serving the requests with [whisperx](https://github.com/m-bain/whisperX).
## Configuration
Modify the YAML files in the `configs/` directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.
## Thank you
The starting point of this repository was the excellent repository by [Jumon](https://github.com/jumon) at https://github.com/jumon/whisper-finetuning
## Contributing
We welcome contributions! Please feel free to submit a Pull Request.
## Support
If you encounter any problems, please file an issue along with a detailed description.
## Maintainer
- Vincenzo Timmel (vincenzo.timmel@fhnw.ch)
## Developers
- Vincenzo Timmel (vincenzo.timmel@fhnw.ch)
- Claudio Paonessa (info@noxenum.io)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.