Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zakariaf/whisperwave

AI-powered audio transcription app using OpenAI Whisper, Flask, and Vue. Upload .wav files, select a language, and get accurate transcriptions. Fully Dockerized!
https://github.com/zakariaf/whisperwave

audio-processing docker flask python speech-to-text transcription vite vue whisper

Last synced: about 4 hours ago
JSON representation

AI-powered audio transcription app using OpenAI Whisper, Flask, and Vue. Upload .wav files, select a language, and get accurate transcriptions. Fully Dockerized!

Awesome Lists containing this project

README

        

# WhisperWave πŸŽ™οΈβž‘οΈπŸ“

**AI-Powered Audio Transcription App** using OpenAI’s Whisper model.

## πŸš€ Features
- Upload `.wav` files and transcribe them into text.
- Select a language before transcription.
- Built with **Vue + Flask + Whisper AI**.
- Fully Dockerized with **Docker Compose**.
- Uses **Whisper as a separate service** for scalability.

![WhisperWave Screenshot](screenshot.png)

---

## πŸ“Œ Getting Started

### πŸ”Ή **1. Clone the Repository**
```bash
git clone https://github.com/zakariaf/whisperwave.git
cd whisperwave
```

### πŸ”Ή **2. Run with Docker Compose**
```bash
docker-compose up --build
```

### πŸ”Ή **3. Open in Browser**
- **Frontend**: `http://localhost:5173`
- **Backend**: `http://localhost:5000`
- **Whisper API**: `http://localhost:6000`

---

## πŸ› οΈ Tech Stack
- **Backend**: Flask, Whisper AI (OpenAI)
- **Frontend**: Vue, Vite, Tailwind CSS
- **Containerization**: Docker, Docker Compose
- **Machine Learning**: OpenAI’s Whisper for Speech-to-Text

---

## πŸ“„ API Endpoints (Flask)

### πŸŽ™οΈ **Transcribe Audio**
```http
POST /transcribe
```

#### **Request (Form Data)**
- `file`: `.wav` file
- `language`: `en`, `es`, `fr`, etc.

#### **Response (JSON)**
```json
{
"transcription": "This is the transcribed text."
}
```

---

## πŸ“„ API Endpoints (Whisper Service)
Since Whisper is a separate service, the backend **calls it internally**, but you can also call it directly.

### πŸŽ™οΈ **Direct Whisper Transcription API**
```http
POST /transcribe
```

#### **Request (JSON)**
```json
{
"file_path": "/uploads/audio.wav",
"language": "en"
}
```

#### **Response (JSON)**
```json
{
"transcription": "Hello, this is a test."
}
```

---

## πŸ”§ **How to Modify the Whisper Model?**
If you want to use a **different model (e.g., `large` instead of `base`)**, update **`whisper_service/app.py`**:

```python
model = whisper.load_model("large")
```

Then restart:
```bash
docker-compose down
docker-compose up --build
```

---

## πŸ“Œ Why is Whisper a Separate Service? πŸ€”

### **1️⃣ Better Scalability**
- The Whisper service runs independently, allowing **the backend and frontend to scale separately**.
- If multiple users upload audio files, Whisper can **run on its own container without blocking the backend**.

### **2️⃣ Performance Optimization**
- Whisper is a **heavy machine learning model**. Keeping it separate ensures **Flask doesn't slow down** while transcribing audio.
- This setup allows for **future GPU acceleration**, making it faster when deployed in cloud environments.

### **3️⃣ Flexibility for Multiple Models**
- You can deploy **different Whisper models** (`base`, `large`) in separate services.
- The backend can **dynamically select which model to use**, depending on the request.

### **4️⃣ Reusability for Other Applications**
- Other apps (mobile apps, other web services) can **use the Whisper API** without needing to integrate Flask.
- The Whisper service can be deployed **independently** on cloud platforms like **AWS, GCP, or DigitalOcean**.

---

## πŸ—οΈ Future Enhancements
- Support for more audio formats.
- Improve UI/UX.
- Implement real-time transcription.
- Add **GPU acceleration** for Whisper.
- Deploy to **cloud (AWS, GCP, DigitalOcean)**.

---

## πŸ“ License
MIT License. Free to use and modify.

---

## πŸ’‘ **Contributing**
Want to improve **WhisperWave**? Feel free to fork the repository and submit a pull request! πŸš€