https://github.com/phineas-pta/fine-tune-whisper-vi

jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2
https://github.com/phineas-pta/fine-tune-whisper-vi

aws docker fine-tuning lora multi-gpu-training speech-recognition speech-to-text vietnamese whisper

Last synced: 7 months ago
JSON representation

jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2

Host: GitHub
URL: https://github.com/phineas-pta/fine-tune-whisper-vi
Owner: phineas-pta
License: apache-2.0
Created: 2024-02-20T09:36:49.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-07T07:40:36.000Z (over 1 year ago)
Last Synced: 2025-04-12T00:42:07.695Z (7 months ago)
Topics: aws, docker, fine-tuning, lora, multi-gpu-training, speech-recognition, speech-to-text, vietnamese, whisper
Language: Jupyter Notebook
Homepage:
Size: 253 KB
Stars: 11
Watchers: 1
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# fine-tune whisper vi

jupyter notebooks to fine tune whisper models on vietnamese using kaggle (should also work on colab but not throughly tested)

using my collection of vietnamese speech datasets: https://huggingface.co/collections/doof-ferb/vietnamese-speech-dataset-65c6af8c15c9950537862fa6

*N.B.1* import any trainer or pipeline class from `transformers` crash kaggle TPU session (see huggingface/transformers#28609) so better use GPU

*N.B.2* ~~trainer class from `transformers` can auto use multi-GPU like kaggle free T4×2 without code change~~ by default trainer use naive model parallelism which cannot fully use all gpu in same time, so better use distributed data parallelism

*N.B.3* use default greedy search, because beam search trigger a spike in VRAM usage which may cause out-of-memory (original whisper use num beams = 5, something like `do_sample=True, num_beams=5`)

*N.B.4* if use kaggle + resume training, remember to enable files persistency before launching

## scripts

evaluate accuracy (WER) with batched inference:
- on whisper models: [evaluate-whisper.ipynb](eval/evaluate-whisper.ipynb)
- on whisper with PEFT LoRA: [evaluate-whisper-lora.ipynb](eval/evaluate-whisper-lora.ipynb)
- on wav2vec BERT v2 models: [evaluate-w2vBERT.ipynb](eval/evaluate-w2vBERT.ipynb)

fine-tune whisper tiny with traditional approach:
- script: [whisper-tiny-traditional.ipynb](train/whisper-tiny-traditional.ipynb)
- model with evaluated WER: https://huggingface.co/doof-ferb/whisper-tiny-vi

fine-tine whisper large with PEFT-LoRA + int8:
- script for 1 GPU: [whisper-large-lora.ipynb](train/whisper-large-lora.ipynb)
- script for multi-GPU using distributed data parallelism: [whisper-large-lora-DDP.ipynb](train/whisper-large-lora-DDP.ipynb)
- model with evaluated WER: https://huggingface.co/doof-ferb/whisper-large-peft-lora-vi

(testing - not always working) fine-tune wav2vec v2 bert: [w2v-bert-v2.ipynb](train/w2v-bert-v2.ipynb)

docker image to run on AWS EC2: [Dockerfile](docker/Dockerfile), comes with standalone scripts

convert to `openai-whisper`, `whisper.cpp`, `faster-whisper`, ONNX, TensorRT: *not yet*

miscellaneous: convert to huggingface audio datasets format

## resources

- https://huggingface.co/blog/fine-tune-whisper
- https://huggingface.co/blog/fine-tune-w2v2-bert
- https://github.com/openai/whisper/discussions/988
- https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
- https://github.com/vasistalodagala/whisper-finetune
- https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event
- https://github.com/krylm/whisper-event-tuning
- https://www.kaggle.com/code/leonidkulyk/train-infer-mega-pack-wav2vec2-whisper-qlora
- https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
- https://alphacephei.com/nsh/2023/01/15/whisper-finetuning.html
- https://discuss.huggingface.co/t/how-to-apply-specaugment-to-a-whisper/40435/3
- https://deepgram.com/learn/whisper-v3-results

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/phineas-pta/fine-tune-whisper-vi

Awesome Lists containing this project

README