{"id":25079797,"url":"https://github.com/zakariaf/whisperwave","last_synced_at":"2026-04-11T12:03:49.578Z","repository":{"id":276168223,"uuid":"928295634","full_name":"zakariaf/whisperwave","owner":"zakariaf","description":"AI-powered audio transcription app with dual processing modes: Local Whisper model or OpenAI API. Upload audio files (.wav, .mp3, .flac, .m4a, .ogg), choose your language, and get accurate transcriptions with optional translation to 15+ languages. Processing Performance analytics. Built with Flask and Vue, fully Dockerized","archived":false,"fork":false,"pushed_at":"2025-02-28T03:26:46.000Z","size":1736,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-01T04:38:50.262Z","etag":null,"topics":["audio-processing","docker","flask","multilanguage","openai-api","python","speech-to-text","transcription","translation","vite","vue","whisper"],"latest_commit_sha":null,"homepage":"","language":"Vue","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zakariaf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-06T11:57:07.000Z","updated_at":"2025-03-25T02:17:57.000Z","dependencies_parsed_at":"2025-02-06T17:39:28.505Z","dependency_job_id":"4a6ac1b7-5158-4ddc-903b-ceab2053e6f0","html_url":"https://github.com/zakariaf/whisperwave","commit_stats":null,"previous_names":["zakariaf/whisperwave"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zakariaf/whisperwave","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zakariaf%2Fwhisperwave","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zakariaf%2Fwhisperwave/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zakariaf%2Fwhisperwave/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zakariaf%2Fwhisperwave/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zakariaf","download_url":"https://codeload.github.com/zakariaf/whisperwave/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zakariaf%2Fwhisperwave/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261344606,"owners_count":23144898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-processing","docker","flask","multilanguage","openai-api","python","speech-to-text","transcription","translation","vite","vue","whisper"],"created_at":"2025-02-07T03:15:26.825Z","updated_at":"2026-04-11T12:03:49.481Z","avatar_url":"https://github.com/zakariaf.png","language":"Vue","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WhisperWave 🎙️➡️📝\n\n**AI-Powered Audio Transcription App** using OpenAI's Whisper model.\n\nWhisperWave gives you the **best of both worlds**: process audio files locally using the built-in Whisper model OR leverage OpenAI's API for enhanced accuracy. The choice is yours for each file you upload!\n\n## 🚀 Features\n- Upload `.wav`, `.mp3`, `.flac`, `.m4a`, or `.ogg` files and transcribe them into text.\n- Select a language before transcription.\n- Choose between local Whisper model or OpenAI API while uploading your file.\n- Both transcription methods work with all supported audio formats.\n- **Translate transcriptions** to multiple languages.\n- **Performance analytics** to compare local vs. API transcription.\n- View your transcription history and restore previous results.\n- Built with **Vue + Flask + Whisper AI**.\n- Fully Dockerized with **Docker Compose**.\n- Uses **Whisper as a separate service** for scalability.\n\n![WhisperWave Screenshot](screenshot.png)\n\n---\n\n## 📌 Getting Started\n\n### How It Works\n\n1. **Upload your audio file** (.wav, .mp3, .flac, .m4a, or .ogg)\n2. **Select your language** (English, German, or other supported languages)\n3. **Choose your transcription method**:\n   - **Local Whisper**: Process directly on your machine\n   - **OpenAI API**: Send to OpenAI for potentially more accurate results\n4. **Optionally select a target translation language**\n5. **Click \"Transcribe\"** and get your text!\n\n### 🔹 **1. Clone the Repository**\n```bash\ngit clone https://github.com/zakariaf/whisperwave.git\ncd whisperwave\n```\n\n### 🔹 **2. Configure OpenAI API Key**\nEdit the `docker-compose.yml` file and replace `your_openai_api_key_here` with your actual OpenAI API key:\n\n```yaml\nwhisper:\n  environment:\n    - OPENAI_API_KEY=your_openai_api_key_here  # Replace with your API key\n```\n\n### 🔹 **3. Run with Docker Compose**\n```bash\ndocker-compose up --build\n```\n\n### 🔹 **4. Open in Browser**\n- **Frontend**: `http://localhost:5173`\n- **Backend**: `http://localhost:5001`\n- **Whisper API**: `http://localhost:6000`\n\n---\n\n## 🛠️ Tech Stack\n- **Backend**: Flask, Whisper AI (OpenAI)\n- **Frontend**: Vue, Vite, Tailwind CSS\n- **Containerization**: Docker, Docker Compose\n- **Machine Learning**: OpenAI's Whisper for Speech-to-Text\n- **Transcription Modes**: Local Whisper model or OpenAI API\n- **Translation**: OpenAI GPT API for multilingual support\n\n---\n\n## 🎛️ **Transcription Modes**\n\n### **Local Whisper Model**\n- Processes audio files directly on your machine using the containerized Whisper model\n- No file size limitations beyond your system's resources\n- Works offline without external API dependencies\n- Great for larger files or when privacy is a concern\n\n### **OpenAI API**\n- Sends your audio file to OpenAI's servers for processing\n- Often provides more accurate transcriptions, especially for difficult audio\n- Has the following limitations:\n  - Maximum file size is 25MB\n  - Requires an internet connection\n  - Consumes OpenAI API credits\n\nIf your file exceeds the 25MB limit for the API mode:\n- The application will automatically alert you\n- You can simply switch to the local mode instead\n- Alternatively, you can compress your audio file or split it into smaller segments\n\n**The choice is yours!** You can easily select which transcription mode to use while uploading your file, giving you the flexibility to choose the best option for each situation.\n\n---\n\n## 🌍 **Translation Feature**\n\nWhisperWave includes support for translating your transcriptions to multiple languages:\n\n### **How It Works**\n1. After selecting the source language for your audio, choose a target language for translation (optional)\n2. When you transcribe your audio, the text will be automatically translated to your chosen language\n3. Both the original transcription and translation will be displayed\n\n### **Supported Languages**\n- English, Spanish, French, German, Italian, Portuguese, Dutch\n- Russian, Chinese, Japanese, Korean, Arabic, Hindi\n- Turkish, Persian/Farsi, and more!\n\n### **Translation Engine**\n- Translations are powered by OpenAI's GPT-4o mini model\n- All translations use the OpenAI API regardless of which transcription mode you select\n- An OpenAI API key is required for translation functionality\n\n---\n\n## 📊 **Analytics Feature**\n\nWhisperWave includes built-in analytics to help you compare and optimize your transcription workflow:\n\n### **Per-Transcription Analytics**\n- Processing time for each transcription\n- Model used (base for local, whisper-1 for API)\n- Transcription mode (local or API)\n- Translation processing time (if applicable)\n\n### **Comparative Analytics**\n- Average processing time by mode (local vs. API)\n- Usage count for each mode\n- Historical performance data\n- Model usage statistics\n\n### **Benefits**\n- Make data-driven decisions about which mode to use for different files\n- Track performance metrics over time\n- Compare processing speed between local and API options\n\nAnalytics data is stored locally in your browser and is available even after restarting the application.\n\n---\n\n## 📄 API Endpoints (Flask)\n\n### 🎙️ **Transcribe Audio**\n```http\nPOST /transcribe\n```\n\n#### **Request (Form Data)**\n- `file`: Audio file (.wav, .mp3, .flac, .m4a, .ogg)\n- `language`: `en`, `de`, etc.\n- `mode`: `local` or `api`\n- `target_language`: (Optional) Language code for translation\n\n#### **Response (JSON)**\n```json\n{\n  \"transcription\": \"This is the transcribed text.\",\n  \"translation\": \"Dies ist der übersetzte Text.\",\n  \"analytics\": {\n    \"processing_time\": 4.32,\n    \"mode\": \"local\",\n    \"model\": \"base\",\n    \"translation_time\": 1.25\n  }\n}\n```\n\n---\n\n## 📄 API Endpoints (Whisper Service)\nSince Whisper is a separate service, the backend **calls it internally**, but you can also call it directly.\n\n### 🎙️ **Direct Whisper Transcription API**\n```http\nPOST /transcribe\n```\n\n#### **Request (JSON)**\n```json\n{\n  \"file_path\": \"/uploads/audio.wav\",\n  \"language\": \"en\",\n  \"mode\": \"local\",\n  \"target_language\": \"fr\"\n}\n```\n\n#### **Response (JSON)**\n```json\n{\n  \"transcription\": \"Hello, this is a test.\",\n  \"translation\": \"Bonjour, c'est un test.\",\n  \"analytics\": {\n    \"processing_time\": 3.21,\n    \"mode\": \"local\",\n    \"model\": \"base\",\n    \"translation_time\": 1.05\n  }\n}\n```\n\n---\n\n## 🔧 **How to Modify the Whisper Model?**\nIf you want to use a **different model (e.g., `large` instead of `base`)**, update **`whisper_service/app.py`**:\n\n```python\nmodel = whisper.load_model(\"large\")\n```\n\nThen restart:\n```bash\ndocker-compose down\ndocker-compose up --build\n```\n\n---\n\n## 📌 Why is Whisper a Separate Service? 🤔\n\n### **1️⃣ Better Scalability**\n- The Whisper service runs independently, allowing **the backend and frontend to scale separately**.\n- If multiple users upload audio files, Whisper can **run on its own container without blocking the backend**.\n\n### **2️⃣ Performance Optimization**\n- Whisper is a **heavy machine learning model**. Keeping it separate ensures **Flask doesn't slow down** while transcribing audio.\n- This setup allows for **future GPU acceleration**, making it faster when deployed in cloud environments.\n\n### **3️⃣ Flexibility for Multiple Models**\n- You can deploy **different Whisper models** (`base`, `large`) in separate services.\n- The backend can **dynamically select which model to use**, depending on the request.\n\n### **4️⃣ Reusability for Other Applications**\n- Other apps (mobile apps, other web services) can **use the Whisper API** without needing to integrate Flask.\n- The Whisper service can be deployed **independently** on cloud platforms like **AWS, GCP, or DigitalOcean**.\n\n---\n\n## 🏗️ Future Enhancements\n- Implement real-time transcription for streaming audio\n- Add **GPU acceleration** for faster local processing\n- Add batch processing for multiple files at once\n- Create speaker diarization to identify different speakers\n- Add audio editing capabilities (trim, cut, etc.) before transcription\n- Add custom vocabulary support for domain-specific terminology\n- Implement user accounts for cloud synchronization of transcription history\n- Deploy to **cloud (AWS, GCP, DigitalOcean)** for global access\n\n---\n\n## 📝 License\nMIT License. Free to use and modify.\n\n---\n\n## 💡 **Contributing**\nWant to improve **WhisperWave**? Feel free to fork the repository and submit a pull request! 🚀","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzakariaf%2Fwhisperwave","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzakariaf%2Fwhisperwave","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzakariaf%2Fwhisperwave/lists"}