Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/luluw8071/whisper-tune

Finetuning Whisper on your own voice
https://github.com/luluw8071/whisper-tune

whisper

Last synced: about 1 month ago
JSON representation

Finetuning Whisper on your own voice

Host: GitHub
URL: https://github.com/luluw8071/whisper-tune
Owner: LuluW8071
License: mit
Created: 2024-10-24T04:23:05.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-10-25T04:57:39.000Z (3 months ago)
Last Synced: 2024-10-25T21:06:24.313Z (3 months ago)
Topics: whisper
Language: Jupyter Notebook
Homepage:
Size: 3.23 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # Whisper Tune



![Code in Progress](https://img.shields.io/badge/status-in_progress-red.svg) ![License](https://img.shields.io/github/license/LuluW8071/Whisper-Tune) ![Open Issues](https://img.shields.io/github/issues/LuluW8071/Whisper-Tune) ![Closed Issues](https://img.shields.io/github/issues-closed/LuluW8071/Whisper-Tune) ![Open PRs](https://img.shields.io/github/issues-pr/LuluW8071/Whisper-Tune) ![Repo Size](https://img.shields.io/github/repo-size/LuluW8071/Whisper-Tune) ![Last Commit](https://img.shields.io/github/last-commit/LuluW8071/Whisper-Tune)



Fine-tune [__OpenAI’s Whisper model__](https://cdn.openai.com/papers/whisper.pdf) for automatic speech recognition (ASR) on custom datasets. This script supports flexible parameterization, model saving, and experiment tracking. 



![Whisper](https://images.ctfassets.net/kftzwdyauwt9/d9c13138-366f-49d3-a1a563abddc1/8acfb590df46923b021026207ff1a438/asr-summary-of-model-architecture-desktop.svg)



## Requirements

To install the required dependencies, you can use the following command:

```bash

pip install -r requirements.txt

```

## Environment Variables

Ensure you have a `.env` file in the project root that contains your [__Comet ML__](https://www.comet.com/) API key for logging:

```

COMET_API_KEY = "your_comet_api_key"

```

The model training logs will be pushed to Comet ML for tracking the experiments.

## Usage

### Collect your own dataset

You can use the [__Mimic Recording Studio__](https://github.com/MycroftAI/mimic-recording-studio) to collect your own dataset.

### 1. Downsample

Downsample the audio files to 16Khz sample rate and change format to FLAC.

```bash

python downsample.py \

    --input_file  \ 

    --output_dir  \

    --percent 20

```

### 2. Merge

Merge train and test JSON files into a single file.

```bash

python merge.py \

       \

    --output merged_train.json

```

| Argument                        | Description                                                                                       | Default Value   |

|----------------------------------|---------------------------------------------------------------------------------------------------|-----------------|

| `--train_json`                   | Path to the training dataset in JSON format.                                                      | N/A             |

| `--test_json`                    | Path to the test dataset in JSON format.                                                          | N/A             |

| `--whisper_model`, `-model`                | Choose from `tiny`, `base`, `small`, `medium`, `large`, `large-v2`, `large-v3`, `large-v3-turbo`, or provide a custom Whisper model name. | `base`             |

| `--batch_size`                   | The batch size for training and evaluation.                                                       | `16`             |

| `--gradient_accumulation_steps`, `-grad_steps`   | Number of gradient accumulation steps.                                                            | `1`             | 

| `--learning_rate`, `-lr`         | Learning rate for training.    | `2e-5`  |

| `--warmup_steps`                 | Number of warmup steps for the learning rate scheduler.                                            | `500`           |

| `--epochs`, `-e`                       | Number of epochs to train for.                                                                    | `10`            |

| `--num_workers`, `-w`            | Number of CPU workers.                                                                            | `2` |

```bash

python train.py \

    --train_json merged_train.json \

    --test_json merged_test.json \

    --whisper_model tiny \

    --batch_size 8 \

    --grad_steps 1 \

    --lr 1e-4 \

    --warmup_steps 75 \

    --epochs 10

    -w 2

```

## Results & Tracking 

_Training logs_, _loss curves_, and _WER_ can be tracked on __Comet ML__ and __TensorBoard__.

| **Model Name**    | **Parameters** | **Eval Loss** | **WER** | **Epochs** | **Batch Size** | **Learning Rate** | **Link** |

|-------------------|----------------|---------------|---------|------------|----------------|--------------------|------------------------|

| **Whisper Tiny**       | 39 M             | 0.3751        | 0.1311  | 10         | 4              | 1e-4               | [🤗](https://huggingface.co/luluw/whisper-tiny)  |

| **Whisper Base**       | 74 M             | 0.2331        | 0.0992  | 10         | 16             | 2e-05              | [🤗](https://huggingface.co/luluw/whisper-base)  |

| **Whisper Small**      | 224 M            | 0.1889        | 0.0811  | 10         | 16             | 2e-05              | [🤗](https://huggingface.co/luluw/whisper-small) |

| **Whisper Medium**     | 769 M            | 0.1404        | 0.0645  | 5          | 8              | 2e-05              | [🤗](https://huggingface.co/luluw/whisper-medium) |

![WER](assets/eval_wer.png)

## Pushing to Hugging Face Hub 🤗

The script is designed to __automatically push the best trained model to the Hugging Face Hub__. Make sure you have set up your Hugging Face credentials properly.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.