{"id":15136497,"url":"https://github.com/i4ds/whisper-finetune","last_synced_at":"2025-10-28T11:16:28.525Z","repository":{"id":248667542,"uuid":"829344589","full_name":"i4Ds/whisper-finetune","owner":"i4Ds","description":"This repository contains code for fine-tuning the Whisper speech-to-text model.","archived":false,"fork":false,"pushed_at":"2024-11-20T09:52:40.000Z","size":37902,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T18:06:16.114Z","etag":null,"topics":["fine-tuning","nlp","speech-to-text","whisper"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"swiss-german-speech-to-text/whisper-finetune","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/i4Ds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-16T08:41:44.000Z","updated_at":"2024-12-31T08:41:27.000Z","dependencies_parsed_at":"2024-08-16T15:04:50.297Z","dependency_job_id":"2520bc90-01ff-4e05-9059-802d024ffc67","html_url":"https://github.com/i4Ds/whisper-finetune","commit_stats":{"total_commits":161,"total_committers":5,"mean_commits":32.2,"dds":0.4782608695652174,"last_synced_commit":"692f740e35f5448ffaad2947716e451babebbba7"},"previous_names":["i4ds/whisper-finetune"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i4Ds%2Fwhisper-finetune","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i4Ds%2Fwhisper-finetune/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i4Ds%2Fwhisper-finetune/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/i4Ds%2Fwhisper-finetune/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/i4Ds","download_url":"https://codeload.github.com/i4Ds/whisper-finetune/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237821575,"owners_count":19371787,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fine-tuning","nlp","speech-to-text","whisper"],"created_at":"2024-09-26T06:22:12.266Z","updated_at":"2025-10-23T11:31:34.060Z","avatar_url":"https://github.com/i4Ds.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Whisper-Finetune\n\n[![MIT License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![GitHub issues](https://img.shields.io/github/issues/i4ds/whisper-finetune.svg)](https://github.com/i4ds/whisper-finetune/issues)\n\nThis repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights \u0026 Biases (wandb) for logging metrics and storing models. Key features include:\n\n- **Multi-Dataset Validation** 🆕 - Evaluate on multiple validation sets simultaneously with macro averaging\n- **Comprehensive Metrics** 🆕 - WER, CER, NLL, log-probability, entropy, and calibration (ECE)\n- **Production-Ready Tests** 🆕 - Fast unit tests with pytest\n- Timestamp training\n- Prompt training\n- Stochastic depth implementation for improved model generalization\n- Correct implementation of SpecAugment for robust audio data augmentation\n- Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions\n- Integration with Weights \u0026 Biases (wandb) for experiment tracking and model versioning\n\n## What's New\n\n### Multi-Dataset Validation System\nEvaluate your model on multiple validation datasets (e.g., clean speech, noisy environments, different microphones) with comprehensive metrics beyond WER:\n\n- **6 metrics per dataset**: WER, CER, NLL, log-prob, entropy, ECE\n- **Macro averaging**: Unweighted mean across datasets (each dataset contributes equally)\n- **Per-utterance tracking**: Detailed metrics for in-depth analysis\n- **Smart checkpointing**: All models saved locally, manual W\u0026B upload to avoid clutter\n\n## Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/i4ds/whisper-finetune.git\n   cd whisper-finetune\n   ```\n\n2. Create and activate a virtual environment (strongly recommended) with Python 3.11 or higher.\n\n3. Install the package in editable mode:\n   ```bash\n   pip install -e .\n   ```\n   \n   Or using UV (very strongly recommended):\n   ```bash\n   uv pip install -e .\n   ```\n\n## Data\nPlease have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a [🤗 Datasets](https://huggingface.co/docs/datasets/en/index) to the script.\n\n## Usage\n\n1. Create a configuration file (see `configs/example_config.yaml` for a fully documented example)\n\n2. Run the fine-tuning script:\n   ```bash\n   python src/whisper_finetune/scripts/finetune.py --config configs/example_config.yaml\n   ```\n\n## Testing\n\nRun the test suite to ensure everything is working:\n\n```bash\n# Install dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Run with verbose output and coverage\npytest -v --cov=whisper_finetune\n```\n\nSee [`tests/README.md`](tests/README.md) for more details.\n\n## Deployment\nWe suggest to use [faster-whisper](https://github.com/SYSTRAN/faster-whisper). To convert your fine-tuned model, you can use the script located at `src/whisper_finetune/scripts/convert_c2t.py`. \n\nFurther improvement of quality can be archieved by serving the requests with [whisperx](https://github.com/m-bain/whisperX).\n\n## Configuration\n\nModify the YAML files in the `configs/` directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.\n\n## Thank you\n\nThe starting point of this repository was the excellent repository by [Jumon](https://github.com/jumon) at https://github.com/jumon/whisper-finetuning\n\n## Contributing\n\nWe welcome contributions! Please feel free to submit a Pull Request.\n\n## Support\n\nIf you encounter any problems, please file an issue along with a detailed description.\n\n## Maintainer\n\n- Vincenzo Timmel (vincenzo.timmel@fhnw.ch)\n\n## Developers\n\n- Vincenzo Timmel (vincenzo.timmel@fhnw.ch)\n- Claudio Paonessa (info@noxenum.io)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fi4ds%2Fwhisper-finetune","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fi4ds%2Fwhisper-finetune","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fi4ds%2Fwhisper-finetune/lists"}