{"id":30805556,"url":"https://github.com/aaronsb/whisper-service","last_synced_at":"2025-09-06T00:58:59.490Z","repository":{"id":260030393,"uuid":"880095984","full_name":"aaronsb/whisper-service","owner":"aaronsb","description":"A Docker-powered transcription service using OpenAI's Whisper model, supporting both local compute and API modes with GPU acceleration and RESTful API endpoints.","archived":false,"fork":false,"pushed_at":"2025-07-24T16:49:47.000Z","size":148,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-24T22:20:29.164Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaronsb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-29T05:22:25.000Z","updated_at":"2025-07-24T16:49:51.000Z","dependencies_parsed_at":"2024-10-29T06:21:46.642Z","dependency_job_id":"23a0d878-a8c7-4d0b-befd-c83cedf17c4e","html_url":"https://github.com/aaronsb/whisper-service","commit_stats":null,"previous_names":["aaronsb/whisper-service"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aaronsb/whisper-service","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fwhisper-service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fwhisper-service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fwhisper-service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fwhisper-service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaronsb","download_url":"https://codeload.github.com/aaronsb/whisper-service/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fwhisper-service/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273842818,"owners_count":25177921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-06T00:58:57.491Z","updated_at":"2025-09-06T00:58:59.472Z","avatar_url":"https://github.com/aaronsb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎙️ Whisper Transcription Service\n\nA Docker-powered service that transcribes audio files using OpenAI's Whisper model. This service is optimized for handling audio files of any size and offers two modes of operation:\n\n1. **Local Compute Mode**: Runs the Whisper model locally on your machine using GPU acceleration (if available)\n2. **API Mode**: Uses OpenAI's Whisper API for transcription, requiring less computational resources\n\n## ✨ Features\n\n- 🚀 Easy setup with Docker\n- 🔄 Flexible configuration with local or API-based transcription\n- 📦 No file size limits with optimized memory handling\n- 🎯 Supports multiple audio formats (.mp3, .wav, .m4a, .mp4, .ogg, .flac, .mkv)\n- 🔊 Optimized MP3 conversion for efficient API usage\n- ⚡ GPU acceleration with CUDA 12.1 support (local mode)\n- 🔄 Concurrent processing with job management\n- 🔍 Real-time job status tracking\n- 🧹 Automatic memory cleanup and optimization\n- 🔒 Secure file handling with non-root user execution\n- 🌐 RESTful API with comprehensive endpoints\n- 📝 Convenient command-line utilities\n- 📊 Streaming response for efficient handling of large transcripts\n- 🔍 Optional transcript retrieval for faster status checks\n\n## 🚀 Quick Start\n\n### Prerequisites\n\nYou'll need:\n- Docker and Docker Compose installed on your machine\n- For local mode:\n  - NVIDIA GPU with CUDA 12.1 support (optional, but recommended for better performance)\n  - NVIDIA Container Toolkit (if using GPU)\n- For API mode:\n  - OpenAI API key with access to the Whisper API\n\n### Detailed Installation Guide\n\n1. **Clone this repository**:\n```bash\ngit clone https://github.com/your-repo/whisper-service\ncd whisper-service\n```\n\n2. **Make the configuration and build scripts executable**:\n```bash\nchmod +x configure.sh build-image.sh\n```\n\n3. **Run the configuration script** to choose between local compute or API mode:\n```bash\n./configure.sh\n```\n\n   This interactive script will:\n   - Ask you to choose between local compute or API mode\n   - If you choose API mode, prompt you for your OpenAI API key\n   - Create a `.env` file with your configuration settings\n   - Set the appropriate environment variables\n\n4. **Build the Docker image**:\n```bash\n./build-image.sh\n```\n\n   This script will:\n   - Read your configuration from the `.env` file\n   - Build the appropriate Docker image based on your selected mode\n   - Tag the image for use with Docker Compose\n\n5. **Start the service**:\n```bash\n# For local mode (default)\ndocker compose up -d\n\n# For API mode\ndocker compose -f docker-compose.api.yml up -d\n```\n\n6. **Verify the service is running**:\n```bash\n# Check container status\ndocker ps\n\n# Check service logs\ndocker logs -f whisper-service_whisper-api_1\n```\n\n7. **Access the service** at http://localhost:9673\n\n### API Key Configuration\n\nFor API mode, you'll need an OpenAI API key with access to the Whisper API:\n\n1. Sign up or log in to your [OpenAI account](https://platform.openai.com/)\n2. Navigate to the API keys section\n3. Create a new API key with appropriate permissions\n4. When running `./configure.sh`, select option 2 (OpenAI API) and paste your API key when prompted\n\nIf you need to change your API key later, simply run `./configure.sh` again and select the API mode option.\n\n## 🎯 Using the Service\n\n### Via Web Interface\n\n1. Open `http://localhost:9673` in your browser\n2. You'll see a simple interface listing all available endpoints\n3. Visit `http://localhost:9673/docs` for interactive API documentation\n\n### Via REST API\n\nThe service provides several endpoints for managing transcription jobs:\n\n#### Submit a Transcription Job\n```bash\ncurl -X POST \"http://localhost:9673/transcribe/\" \\\n     -F \"file=@path/to/your/audio.mp3\"\n```\nResponse:\n```json\n{\n    \"job_id\": \"job_1234567890_abcd\",\n    \"status\": \"queued\",\n    \"message\": \"Transcription job queued successfully\",\n    \"file_info\": {\n        \"name\": \"audio.mp3\",\n        \"size\": 1048576\n    }\n}\n```\n\n#### Check Job Status\n```bash\n# Get job status with full transcript (default)\ncurl \"http://localhost:9673/status/job_1234567890_abcd\"\n\n# Get job status without transcript (faster for large transcripts)\ncurl \"http://localhost:9673/status/job_1234567890_abcd?include_transcript=false\"\n```\n\n#### List All Jobs\n```bash\ncurl \"http://localhost:9673/jobs\"\n```\n\n#### Terminate a Job\n```bash\ncurl -X DELETE \"http://localhost:9673/jobs/job_1234567890_abcd\"\n```\n\n#### Stream Transcript (Optimized for Large Transcripts)\n```bash\n# Stream the transcript for a completed job\ncurl \"http://localhost:9673/transcript/job_1234567890_abcd\"\n```\nThis endpoint uses HTTP streaming to efficiently transfer large transcripts without memory issues.\n\n#### Check Service Health\n```bash\ncurl \"http://localhost:9673/health\"\n```\n\n### Via Command-Line Client\n\nA dedicated command-line client is available at [whisper-client](https://github.com/aaronsb/whisper-client), providing a convenient interface for transcribing files, managing jobs, and tracking progress.\n\n### Utility Scripts\n\nThe service includes two utility scripts for processing audio files:\n\n#### 1. process_audio.py\nDirect audio file processing script:\n```bash\npython3 process_audio.py input.mp3\n```\nThis will create a JSON output file with the full transcription results.\n\n#### 2. process_whisper.py\nUtility for extracting plain text from transcription JSON files:\n```bash\npython3 process_whisper.py --dir /path/to/transcripts\n```\nThis will process all JSON transcription files in the directory and create corresponding .txt files with just the transcribed text.\n\n## 🔧 Configuration\n\n### Choosing Between Local and API Mode\n\nThe service offers two modes of operation:\n\n1. **Local Compute Mode**:\n   - Uses the Whisper model running locally in the container\n   - Requires more computational resources\n   - Benefits from GPU acceleration if available\n   - No API key required\n   - Better for high-volume transcription or privacy-sensitive applications\n\n2. **API Mode**:\n   - Uses OpenAI's Whisper API for transcription\n   - Requires less computational resources\n   - Requires an OpenAI API key\n   - May have usage limits based on your OpenAI plan\n   - Better for lightweight deployments or when GPU is not available\n\nYou can switch between modes using the `configure.sh` script, which will:\n- Set the appropriate environment variables\n- Create a `.env` file with your configuration\n- Guide you through API key setup if needed\n\n### Performance Optimization\n\n#### Audio Format Optimization\n\nThe service automatically optimizes audio for transcription:\n- Converts all audio to MP3 format with settings optimized for speech:\n  * 16kHz sample rate (optimal for speech recognition)\n  * Mono audio (sufficient for voice)\n  * MP3 quality level 4 (good balance of quality and size)\n- Reduces bandwidth usage when sending to OpenAI API\n- Maintains consistent quality across all processing stages\n\n#### Local Mode Optimizations\n\nThe local mode includes several performance optimizations configured in the Dockerfile:\n\n```dockerfile\n# GPU Memory Optimization\nENV PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512\n\n# Tokenizer Performance\nENV TOKENIZERS_PARALLELISM=true\n```\n\nThese environment variables can be adjusted in the Dockerfile to optimize performance for your specific use case.\n\n#### Docker Compose Configuration\n\nThe service is configured in `docker-compose.yml` with optimized settings for handling large audio files:\n\n```yaml\nservices:\n  whisper-api:\n    # ... other settings ...\n    shm_size: '8gb'  # Shared memory size for large file processing\n    ulimits:\n      memlock: -1    # Unlimited locked-in-memory address space\n      stack: 67108864  # Stack size limit\n    command: \u003e\n      uvicorn main:app\n      --host 0.0.0.0\n      --port 9673\n      --timeout-keep-alive 300  # Keep-alive timeout in seconds\n      --workers 1              # Number of worker processes\n      --log-level info\n      --reload                # Auto-reload on code changes (development)\n```\n\nThese settings can be adjusted based on your system resources and requirements.\n\n### Security Configuration\n\nThe service runs as a non-root user for enhanced security in both modes:\n- Dedicated 'whisper' user created in container\n- All processes run with limited permissions\n- Upload and temp directories with controlled access\n\n### GPU Support (Local Mode Only)\n\nIn local mode, the service automatically detects and uses your NVIDIA GPU if available. GPU support is configured in `docker-compose.yml`.\n\n## 🔍 API Response Formats\n\n### Transcription Result\n```json\n{\n    \"text\": \"Complete transcribed text...\",\n    \"segments\": [\n        {\n            \"start\": 0.0,\n            \"end\": 2.5,\n            \"text\": \"Segment text...\"\n        }\n    ]\n}\n```\n\n### Health Check\n```json\n{\n    \"status\": \"healthy\",\n    \"model\": \"whisper-base\",\n    \"supported_formats\": [\".mp3\", \".wav\", \".m4a\", \".mp4\", \".ogg\", \".flac\", \".mkv\"],\n    \"max_file_size\": \"unlimited\",\n    \"gpu_available\": true,\n    \"active_jobs\": 1,\n    \"max_concurrent_jobs\": 3\n}\n```\n\n## 🚨 Common Issues \u0026 Solutions\n\n### General Issues\n\n1. **\"Error: Job queue full\"**\n   - Wait for current jobs to complete\n   - Monitor active jobs using the /jobs endpoint\n   - Consider adjusting the number of workers if system resources allow\n\n2. **Permission Issues**\n   - Ensure upload/temp directories have correct permissions\n   - Verify Docker user mapping if using custom UID/GID\n   - Check file ownership in container\n\n### Local Mode Issues\n\n1. **\"Error: GPU not available\"**\n   - Check CUDA 12.1 compatibility with your GPU\n   - Verify NVIDIA Container Toolkit is installed\n   - Try running `nvidia-smi` to confirm GPU is detected\n\n2. **Memory Issues with Large Files**\n   - Increase `shm_size` in docker-compose.yml\n   - Adjust PYTORCH_CUDA_ALLOC_CONF in Dockerfile\n   - Monitor container resources with `docker stats`\n\n3. **Service Performance**\n   - Remove `--reload` flag in production\n   - Adjust number of workers based on CPU cores\n   - Consider GPU acceleration for faster processing\n   - Tune TOKENIZERS_PARALLELISM based on workload\n\n### API Mode Issues\n\n1. **\"Error: OpenAI API key not provided\"**\n   - Run `./configure.sh` again to set up your API key\n   - Verify the API key is correctly set in the .env file\n   - Check that the API key has access to the Whisper API\n\n2. **\"Error: API request failed\"**\n   - Check your OpenAI account for API limits or billing issues\n   - Verify network connectivity from the container\n   - Check for any OpenAI service outages\n\n## 🔍 Understanding the Components\n\n- `main.py`: FastAPI application with job management and API endpoints\n- `process_audio.py`: Direct audio transcription utility\n- `process_whisper.py`: JSON transcript to text converter\n- `Dockerfile`: Container image definition for local mode with CUDA support\n- `Dockerfile.api`: Container image definition for API mode (lightweight)\n- `docker-compose.yml`: Service orchestration for local mode\n- `docker-compose.api.yml`: Service orchestration for API mode\n- `configure.sh`: Configuration script to choose between local and API modes\n- `build-image.sh`: Script to build the appropriate Docker image\n\n## 🤝 Contributing\n\nWe welcome contributions! Please feel free to submit issues and pull requests.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## 📝 License\n\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n\n## 💡 Need Help?\n\n- Check the [FAQs](https://github.com/your-repo/whisper-service/wiki/FAQ) (if available)\n- Open an [issue](https://github.com/your-repo/whisper-service/issues)\n- Read OpenAI's [Whisper documentation](https://github.com/openai/whisper)\n\n---\n\nBuilt with ❤️ using [OpenAI Whisper](https://github.com/openai/whisper) and [FastAPI](https://fastapi.tiangolo.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronsb%2Fwhisper-service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaronsb%2Fwhisper-service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronsb%2Fwhisper-service/lists"}