Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/phineas-pta/fine-tune-whisper-vi
jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2
https://github.com/phineas-pta/fine-tune-whisper-vi
aws docker fine-tuning lora multi-gpu-training speech-recognition speech-to-text vietnamese whisper
Last synced: 3 months ago
JSON representation
jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2
- Host: GitHub
- URL: https://github.com/phineas-pta/fine-tune-whisper-vi
- Owner: phineas-pta
- License: apache-2.0
- Created: 2024-02-20T09:36:49.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-04-12T23:25:55.000Z (9 months ago)
- Last Synced: 2024-04-13T16:39:38.600Z (9 months ago)
- Topics: aws, docker, fine-tuning, lora, multi-gpu-training, speech-recognition, speech-to-text, vietnamese, whisper
- Language: Jupyter Notebook
- Homepage:
- Size: 206 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fine-tune whisper vi
jupyter notebooks to fine tune whisper models on vietnamese using kaggle (should also work on colab but not throughly tested)
using my collection of vietnamese speech datasets: https://huggingface.co/collections/doof-ferb/vietnamese-speech-dataset-65c6af8c15c9950537862fa6
*N.B.1* import any trainer or pipeline class from `transformers` crash kaggle TPU session (see huggingface/transformers#28609) so better use GPU
*N.B.2* ~~trainer class from `transformers` can auto use multi-GPU like kaggle free T4×2 without code change~~ by default trainer use naive model parallelism which cannot fully use all gpu in same time, so better use distributed data parallelism
*N.B.3* use default greedy search, because beam search trigger a spike in VRAM usage which may cause out-of-memory (original whisper use num beams = 5, something like `do_sample=True, num_beams=5`)
*N.B.4* if use kaggle + resume training, remember to enable files persistency before launching
## scripts
evaluate accuracy (WER) with batched inference:
- on whisper models: [evaluate-whisper.ipynb](eval/evaluate-whisper.ipynb)
- on whisper with PEFT LoRA: [evaluate-whisper-lora.ipynb](eval/evaluate-whisper-lora.ipynb)
- on wav2vec BERT v2 models: [evaluate-w2vBERT.ipynb](eval/evaluate-w2vBERT.ipynb)fine-tune whisper tiny with traditional approach:
- script: [whisper-tiny-traditional.ipynb](train/whisper-tiny-traditional.ipynb)
- model with evaluated WER: https://huggingface.co/doof-ferb/whisper-tiny-vifine-tine whisper large with PEFT-LoRA + int8:
- script for 1 GPU: [whisper-large-lora.ipynb](train/whisper-large-lora.ipynb)
- script for multi-GPU using distributed data parallelism: [whisper-large-lora-DDP.ipynb](train/whisper-large-lora-DDP.ipynb)
- model with evaluated WER: https://huggingface.co/doof-ferb/whisper-large-peft-lora-vi(testing - not always working) fine-tune wav2vec v2 bert: [w2v-bert-v2.ipynb](train/w2v-bert-v2.ipynb)
docker image to run on AWS EC2: [Dockerfile](docker/Dockerfile), comes with standalone scripts
convert to `openai-whisper`, `whisper.cpp`, `faster-whisper`, ONNX, TensorRT: *not yet*
miscellaneous: convert to huggingface audio datasets format
## resources
- https://huggingface.co/blog/fine-tune-whisper
- https://huggingface.co/blog/fine-tune-w2v2-bert
- https://github.com/openai/whisper/discussions/988
- https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
- https://github.com/vasistalodagala/whisper-finetune
- https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event
- https://github.com/krylm/whisper-event-tuning
- https://www.kaggle.com/code/leonidkulyk/train-infer-mega-pack-wav2vec2-whisper-qlora
- https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
- https://alphacephei.com/nsh/2023/01/15/whisper-finetuning.html
- https://discuss.huggingface.co/t/how-to-apply-specaugment-to-a-whisper/40435/3
- https://deepgram.com/learn/whisper-v3-results