https://github.com/hyunnnchoi/tethys-speech
tf2 implementation of whisper & wav2vec2 models w/ distributed training for k8s/kubeflow
https://github.com/hyunnnchoi/tethys-speech
deep-learning distributed-training kubeflow kubernetes machine-learning nlp speech-recognition speech-to-text tensorflow tfjob wav2vec2 whisper
Last synced: 3 months ago
JSON representation
tf2 implementation of whisper & wav2vec2 models w/ distributed training for k8s/kubeflow
- Host: GitHub
- URL: https://github.com/hyunnnchoi/tethys-speech
- Owner: hyunnnchoi
- Created: 2025-05-20T14:04:15.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-06-02T10:29:14.000Z (5 months ago)
- Last Synced: 2025-06-09T13:04:04.500Z (4 months ago)
- Topics: deep-learning, distributed-training, kubeflow, kubernetes, machine-learning, nlp, speech-recognition, speech-to-text, tensorflow, tfjob, wav2vec2, whisper
- Language: Python
- Homepage:
- Size: 227 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tethys-Speech
A TensorFlow-based repository for speech recognition model implementation and distributed training.
## Overview
This project includes precise TensorFlow implementations of two major speech recognition models, Whisper and Wav2Vec2, with full support for distributed training in Kubernetes environments. These implementations faithfully reproduce the original model architectures with high fidelity to their published specifications.
The jobs in this repository are specifically designed to serve as workloads for scheduler performance evaluation in distributed training environments.
## Main Models
- **Whisper**: Speech-to-text model developed by OpenAI, implemented with precise architecture matching the original design
- **Wav2Vec2**: Self-supervised learning-based speech recognition model developed by Meta, implemented with detailed attention to the original architecture specificationsBoth models are fully implemented in TensorFlow, providing an alternative to the original PyTorch implementations.
## Directory Structure
```
tethys-speech/
├── speech_jobs/ # Speech recognition model implementation files
│ ├── whisper_dist.py # Whisper model and distributed training code
│ └── wav2vec2_dist.py # Wav2Vec2 model and distributed training code
├── stable_jobs/ # Stabilized implementation files
├── sample_tfjobs/ # Kubeflow TFJob configuration files
│ ├── whisper-dist.yaml
│ └── wav2vec2-dist.yaml
```## Features
- Whisper and Wav2Vec2 models precisely implemented in TensorFlow
- Full distributed training support using TensorFlow's MultiWorkerMirroredStrategy
- TFJob configurations optimized for performance evaluation of Kubernetes schedulers
- Training monitoring and automatic checkpoint saving
- Compatible with Kubeflow and Training Operator 1.7.0## Usage
### Local Training
```bash
python speech_jobs/whisper_dist.py --batch_size 4 --num_batches 30
```### Distributed Training (Kubeflow)
```bash
kubectl apply -f sample_tfjobs/whisper-dist.yaml
```## Performance Metrics
The following metrics are automatically recorded during model training:
- Training loss and accuracy
- GPU and network usage
- Job Completion Time (JCT)## Docker Image
A pre-built Docker image with all dependencies is available on DockerHub:
```
potato4332/speech-image:0.0.1-beta
```## Dependencies
- TensorFlow 2.x
- CUDA 11.x and cuDNN 8.x
- NumPy
- TensorFlow Datasets
- Kubernetes (for distributed training)
- Kubeflow Training Operator 1.7.0## Distributed Training
This implementation leverages TensorFlow's MultiWorkerMirroredStrategy for efficient distributed training across multiple nodes. It has been tested and optimized to work seamlessly with Kubeflow's TFJob operator, specifically version 1.7.0 of the Training Operator.