Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zakariaf/whisperwave
AI-powered audio transcription app using OpenAI Whisper, Flask, and Vue. Upload .wav files, select a language, and get accurate transcriptions. Fully Dockerized!
https://github.com/zakariaf/whisperwave
audio-processing docker flask python speech-to-text transcription vite vue whisper
Last synced: about 4 hours ago
JSON representation
AI-powered audio transcription app using OpenAI Whisper, Flask, and Vue. Upload .wav files, select a language, and get accurate transcriptions. Fully Dockerized!
- Host: GitHub
- URL: https://github.com/zakariaf/whisperwave
- Owner: zakariaf
- License: mit
- Created: 2025-02-06T11:57:07.000Z (about 19 hours ago)
- Default Branch: main
- Last Pushed: 2025-02-06T16:42:38.000Z (about 14 hours ago)
- Last Synced: 2025-02-06T17:39:21.769Z (about 13 hours ago)
- Topics: audio-processing, docker, flask, python, speech-to-text, transcription, vite, vue, whisper
- Language: Vue
- Homepage:
- Size: 240 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WhisperWave ποΈβ‘οΈπ
**AI-Powered Audio Transcription App** using OpenAIβs Whisper model.
## π Features
- Upload `.wav` files and transcribe them into text.
- Select a language before transcription.
- Built with **Vue + Flask + Whisper AI**.
- Fully Dockerized with **Docker Compose**.
- Uses **Whisper as a separate service** for scalability.![WhisperWave Screenshot](screenshot.png)
---
## π Getting Started
### πΉ **1. Clone the Repository**
```bash
git clone https://github.com/zakariaf/whisperwave.git
cd whisperwave
```### πΉ **2. Run with Docker Compose**
```bash
docker-compose up --build
```### πΉ **3. Open in Browser**
- **Frontend**: `http://localhost:5173`
- **Backend**: `http://localhost:5000`
- **Whisper API**: `http://localhost:6000`---
## π οΈ Tech Stack
- **Backend**: Flask, Whisper AI (OpenAI)
- **Frontend**: Vue, Vite, Tailwind CSS
- **Containerization**: Docker, Docker Compose
- **Machine Learning**: OpenAIβs Whisper for Speech-to-Text---
## π API Endpoints (Flask)
### ποΈ **Transcribe Audio**
```http
POST /transcribe
```#### **Request (Form Data)**
- `file`: `.wav` file
- `language`: `en`, `es`, `fr`, etc.#### **Response (JSON)**
```json
{
"transcription": "This is the transcribed text."
}
```---
## π API Endpoints (Whisper Service)
Since Whisper is a separate service, the backend **calls it internally**, but you can also call it directly.### ποΈ **Direct Whisper Transcription API**
```http
POST /transcribe
```#### **Request (JSON)**
```json
{
"file_path": "/uploads/audio.wav",
"language": "en"
}
```#### **Response (JSON)**
```json
{
"transcription": "Hello, this is a test."
}
```---
## π§ **How to Modify the Whisper Model?**
If you want to use a **different model (e.g., `large` instead of `base`)**, update **`whisper_service/app.py`**:```python
model = whisper.load_model("large")
```Then restart:
```bash
docker-compose down
docker-compose up --build
```---
## π Why is Whisper a Separate Service? π€
### **1οΈβ£ Better Scalability**
- The Whisper service runs independently, allowing **the backend and frontend to scale separately**.
- If multiple users upload audio files, Whisper can **run on its own container without blocking the backend**.### **2οΈβ£ Performance Optimization**
- Whisper is a **heavy machine learning model**. Keeping it separate ensures **Flask doesn't slow down** while transcribing audio.
- This setup allows for **future GPU acceleration**, making it faster when deployed in cloud environments.### **3οΈβ£ Flexibility for Multiple Models**
- You can deploy **different Whisper models** (`base`, `large`) in separate services.
- The backend can **dynamically select which model to use**, depending on the request.### **4οΈβ£ Reusability for Other Applications**
- Other apps (mobile apps, other web services) can **use the Whisper API** without needing to integrate Flask.
- The Whisper service can be deployed **independently** on cloud platforms like **AWS, GCP, or DigitalOcean**.---
## ποΈ Future Enhancements
- Support for more audio formats.
- Improve UI/UX.
- Implement real-time transcription.
- Add **GPU acceleration** for Whisper.
- Deploy to **cloud (AWS, GCP, DigitalOcean)**.---
## π License
MIT License. Free to use and modify.---
## π‘ **Contributing**
Want to improve **WhisperWave**? Feel free to fork the repository and submit a pull request! π