https://github.com/kingabzpro/simple-mlops-with-urdu-asr
A beginner-friendly project for building, testing, and deploying an Urdu ASR (Automatic Speech Recognition) model.
https://github.com/kingabzpro/simple-mlops-with-urdu-asr
Last synced: about 1 month ago
JSON representation
A beginner-friendly project for building, testing, and deploying an Urdu ASR (Automatic Speech Recognition) model.
- Host: GitHub
- URL: https://github.com/kingabzpro/simple-mlops-with-urdu-asr
- Owner: kingabzpro
- License: apache-2.0
- Created: 2025-06-28T12:46:37.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-11T13:19:02.000Z (3 months ago)
- Last Synced: 2025-07-11T15:45:09.030Z (3 months ago)
- Language: Jupyter Notebook
- Size: 66.4 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ค Urdu ASR โ Modern MLOps Pipeline
[](LICENSE)
[](https://www.python.org/downloads/)
[](https://bentoml.com/)A production-ready end-to-end MLOps pipeline for Urdu Automatic Speech Recognition using fine-tuned Whisper-v3, featuring:
- ๐ **Automated Pipeline**: Prefect orchestration for data processing, training, and deployment
- ๐ **Experiment Tracking**: MLflow integration for comprehensive model tracking
- ๐ **Modern Serving**: BentoML 2.0+ with observability and monitoring
- ๐ฏ **Interactive UI**: Gradio interface for easy testing and demonstration
- ๐ **Observability**: Built-in metrics, logging, and health monitoring## โจ Features
- **Fine-tuned Whisper-large-v3** on Common Voice Urdu dataset
- **Real-time transcription** with audio duration and processing time metrics
- **Comprehensive error handling** with detailed error types and messages
- **Health monitoring** and service status endpoints
- **Scalable deployment** with configurable resources and concurrency
- **Development-friendly** with hot reloading and debugging support## ๐ Quick Start
### Prerequisites
```bash
# Python 3.11+
python --version# Install dependencies
pip install -r requirements.txt# Set up environment variables
export HF_TOKEN="your_huggingface_token"
export MLFLOW_TRACKING_URI="your_mlflow_uri" # optional
```### Run the Complete Pipeline
```bash
# Execute the full MLOps pipeline
prefect worker start --pool default-agent-pool &
prefect deploy --name urdu-asr-pipeline
prefect deployment run asr_pipeline/urdu-asr-pipeline# Or run locally
python prefect/pipeline.py
```### Serve the Model
```bash
# Build and serve locally
bentoml build
bentoml serve urdu-asr:latest# Or deploy to BentoCloud
bentoml deploy urdu-asr:latest
```### Launch Gradio Interface
```bash
# Update endpoint in gradio_app/app.py
# Then launch the interface
python gradio_app/app.py
```## ๐ Pipeline Steps
1. **Data Download** (`scripts/download_dataset.py`)
- Downloads Common Voice v17 Urdu dataset
- Handles authentication with HuggingFace2. **Data Preprocessing** (`scripts/preprocess_dataset.py`)
- Filters for valid Urdu text using regex
- Resamples audio to 16kHz mono FLAC
- Parallel processing for efficiency3. **Model Training** (`scripts/train.py`)
- Fine-tunes Whisper-large-v3 on Urdu data
- MLflow experiment tracking
- Automatic checkpoint management
- Early stopping and best model selection4. **Model Evaluation** (`scripts/evaluate.py`)
- Calculates Word Error Rate (WER)
- Generates prediction samples5. **Model Serving** (`bento_service/service.py`)
- BentoML service with observability
- Health checks and monitoring
- Error handling and validation## ๐ง Configuration
### Environment Variables
```bash
# Required
export HF_TOKEN="your_token"# Optional - Training
export MODEL_ID="openai/whisper-large-v3"
export NUM_EPOCHS="3"
export TRAIN_BATCH_SIZE="16"
export LEARNING_RATE="1e-5"# Optional - Serving
export BENTO_ENDPOINT="http://localhost:3000/transcribe"
export MAX_FILE_SIZE_MB="25"# Optional - MLflow
export MLFLOW_TRACKING_URI="sqlite:///mlflow.db"
```### Resource Configuration
The BentoML service is configured for:
- **Memory**: 2Gi
- **GPU**: 1 (if available)
- **Concurrency**: 4 requests
- **Timeout**: 30 seconds## ๐ Monitoring & Observability
The service includes comprehensive monitoring:
- **Request Metrics**: Count, duration, success/error rates
- **Audio Metrics**: File size, duration, format validation
- **Model Metrics**: Processing time, inference performance
- **Health Checks**: Service status and model availabilityAccess metrics at:
- Health: `GET /health`
- Metrics: `GET /metrics` (Prometheus format)
- Docs: `GET /docs` (OpenAPI)## ๐งช Testing
```bash
# Test the service locally
curl -X POST "http://localhost:3000/transcribe" \
-F "audio_file=@test_audio.wav"# Health check
curl "http://localhost:3000/health"
```## ๐ Project Structure
```
.
โโโ bento_service/
โ โโโ service.py # Modern BentoML service
โโโ gradio_app/
โ โโโ app.py # Enhanced Gradio interface
โ โโโ requirements.txt # UI dependencies
โโโ scripts/
โ โโโ download_dataset.py # Data acquisition
โ โโโ preprocess_dataset.py # Data processing
โ โโโ train.py # Model training
โ โโโ evaluate.py # Model evaluation
โโโ prefect/
โ โโโ pipeline.py # Orchestration pipeline
โโโ bentofile.yaml # BentoML configuration
โโโ requirements.txt # Core dependencies
โโโ README.md
```