{"id":28749722,"url":"https://github.com/nico-byte/whisper-web","last_synced_at":"2026-04-26T08:34:49.948Z","repository":{"id":298762231,"uuid":"1001070012","full_name":"nico-byte/whisper-web","owner":"nico-byte","description":"The Whisper Web Transcription Server is a Python-based real-time speech-to-text transcription system powered by OpenAI's Whisper models. It leverages state-of-the-art models like Distil-Whisper to transcribe audio input in real-time.","archived":false,"fork":false,"pushed_at":"2025-12-29T20:04:48.000Z","size":7358,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-01-02T02:33:48.635Z","etag":null,"topics":["ai","asr","automatic-speech-recognition","distil-whisper","distil-whisper-large-v3","huggingface","huggingface-transformers","server","vad","voice","web","websockets","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nico-byte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-12T19:13:07.000Z","updated_at":"2025-12-29T20:04:51.000Z","dependencies_parsed_at":"2025-06-12T20:53:10.296Z","dependency_job_id":"ba94d919-472f-43f7-93f3-3620e3eda292","html_url":"https://github.com/nico-byte/whisper-web","commit_stats":null,"previous_names":["nico-byte/whisper-web"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nico-byte/whisper-web","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nico-byte%2Fwhisper-web","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nico-byte%2Fwhisper-web/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nico-byte%2Fwhisper-web/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nico-byte%2Fwhisper-web/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nico-byte","download_url":"https://codeload.github.com/nico-byte/whisper-web/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nico-byte%2Fwhisper-web/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32290801,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T08:29:33.829Z","status":"ssl_error","status_checked_at":"2026-04-26T08:29:18.366Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","asr","automatic-speech-recognition","distil-whisper","distil-whisper-large-v3","huggingface","huggingface-transformers","server","vad","voice","web","websockets","whisper"],"created_at":"2025-06-16T21:00:35.379Z","updated_at":"2026-04-26T08:34:49.942Z","avatar_url":"https://github.com/nico-byte.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Whisper Realtime Transcriber\n\n## Table of Contents\n\n- [Whisper Realtime Transcriber](#whisper-realtime-transcriber)\n  - [Table of Contents](#table-of-contents)\n  - [Project Overview](#project-overview)\n    - [Prerequisites](#prerequisites)\n    - [Installation](#installation)\n    - [Quick Start](#quick-start)\n    - [Model Installation and Management](#model-installation-and-management)\n    - [Supported Model IDs](#supported-model-ids)\n    - [Troubleshooting Model Installation](#troubleshooting-model-installation)\n    - [How it works](#how-it-works)\n    - [Documentation](#documentation)\n  - [Deployment](#deployment)\n    - [Overview](#overview)\n    - [Project Structure](#project-structure)\n  - [Environment Configuration](#environment-configuration)\n    - [Environment Variables Template](#environment-variables-template)\n    - [Makefile Tasks](#makefile-tasks)\n  - [Deployment Instructions](#deployment-instructions)\n    - [Prerequisites (Docker)](#prerequisites-docker)\n    - [Quick Start](#quick-start-1)\n    - [Individual Service Management](#individual-service-management)\n    - [Profile-Based Deployments](#profile-based-deployments)\n    - [Local Development (without Docker)](#local-development-without-docker)\n  - [Troubleshooting](#troubleshooting)\n    - [Build Issues](#build-issues)\n    - [Service Status and Connectivity](#service-status-and-connectivity)\n    - [Common Issues](#common-issues)\n  - [Monitoring and Maintenance](#monitoring-and-maintenance)\n    - [Health Checks](#health-checks)\n  - [Security Considerations](#security-considerations)\n\n## Project Overview\n\nThe Whisper Web Transcription Server is a Python-based real-time speech-to-text transcription system powered by\nOpenAI's Whisper models. It leverages state-of-the-art models like Distil-Whisper to transcribe audio input\nin real-time.\n\nKey Features:\n* Real-time Transcription: Transcribes spoken words into text almost instantaneously, making it ideal for\nuse cases like live captioning, real-time subtitles, or interactive voice-driven applications.\n* Customizable Model Configurations: The system can be fine-tuned with various model configurations to\naccommodate different use cases, such as adjusting the sampling rate, block size, or selecting from\ndifferent Whisper model sizes.\n* API Integration: Exposes a RESTful API for easy integration with other services. You can send and retrieve\ntranscriptions via HTTP requests, allowing for real-time updates in web or mobile apps.\n* Multi-threaded Asynchronous Processing: Leverages asynchronous programming (via asyncio) for optimal\nperformance, allowing the transcription engine to handle high volumes of audio input while processing\ntranscription results concurrently.\n* Memory-Efficient and Scalable: Designed to work efficiently even with resource-intensive models, offering\nscalable transcription performance with lower resource consumption.\n\nBy combining cutting-edge machine learning models and real-time audio processing, this project enables fast,\naccurate, and flexible transcription solutions for various audio-driven applications.\n\n### Prerequisites\n\n- [Python 3.12](https://www.python.org) installed on the machine, either standalone or via a package manager like uv. We recommend to use [uv](https://docs.astral.sh/uv/getting-started/installation/).\n- Optional: Microphone connected to the machine.\n\n### Installation\n\nInstall dependencies via a package manager like uv, in project root:\n\n```bash\nuv sync --all-groups --extra cpu # or --extra cu128 for torch with CUDA support\n```\n\nSetup .env and model cache dir:\n\n```bash\nsh setup.sh\n```\n\n### Quick Start\n\nAfter completing the installation, one can now use the transcriber.\n\nFor microphone input the cli client can be used. For File Uploads the streamlit upload client can be used.\nThere is also a streamlit viewer client, that shows all active sessions and their transcriptions on the server.\n\nSimply start the server and cli/streamlit clients via the Makefile tasks:\n\n```bash\nmake local.run.upload # for server plus streamlit upload app - doesn't support mic input\nmake local.run.cli # for server plus cli client - supports mic input\n```\n\nAlternatively one can run the docker images:\n\n```bash\nmake docker.up.server-cpu-streamlit # for cpu server\nmake docker.up.server-cuda-streamlit # for cuda server\n``` \n\nMore on the Makefile tasks in the [Makefile Tasks](#makefile-tasks) section.\n\n### Model Installation and Management\n\nOn first startup, no models are installed by default. You can install Whisper models accordingly:\n\n1. Access the Streamlit Upload App at http://localhost:8501\n2. In the model selection field, enter one of the supported model IDs listed below\n3. Click \"Create Session\" to download and initialize the model\n4. **Important**: Model downloading happens in the background without visual feedback. Session creation may take 2-10 minutes depending on your internet connection and the model size.\n\n#### Supported Model IDs\n\nWhisper Web supports all whisper models available at huggingface but here are some reccomendations:\n\n**English Models:**\n- `distil-whisper/distil-large-v3.5` - **Recommended** - Best accuracy, larger size (~1.5GB)\n- `distil-whisper/distil-medium.en` - Good balance of speed and accuracy (~760MB)\n- `distil-whisper/distil-small.en` - Fastest, smallest size (~240MB)\n\n**German Models:**\n- `primeline/distil-whisper-large-v3-german` - German-optimized model (~1.5GB)\n\n**Multilingual Models:**\n- `openai/whisper-tiny` - Ultra-fast, 39MB, supports 99 languages\n- `openai/whisper-base` - Fast, 140MB, supports 99 languages\n- `openai/whisper-small` - Balanced, 460MB, supports 99 languages\n\n#### Troubleshooting Model Installation\n\n**If session creation takes too long:**\n- Check your internet connection\n- Monitor server logs: `docker compose logs -f server-cpu`\n- Verify disk space availability in the `.models` directory\n\n**If model download fails:**\n- Ensure the model ID is spelled correctly\n- Check HuggingFace Hub connectivity: `curl -I https://huggingface.co`\n- Verify Docker container has internet access\n\n**Storage requirements:**\n- Reserve at least 2GB free disk space for model storage\n- Models are cached in `.models/` directory and persist between restarts\n- Use `du -sh .models/` to check current model storage usage\n\n### How it works\n\n- The transcriber consists of two modules: a Inputstream Generator and a Whisper Model.\n- The implementation of the Inputstream Generator is based on this [implementation](https://github.com/tobiashuttinger/openai-whisper-realtime).\n- The Inputstream Generator reads the microphone input and passes it over an event bus to the Whisper Model. The Whisper Model then generates the transcriptions and passes them via the event bus to the server.\n- This is happening in an async event loop so that the Whisper Model can continuously generate\ntranscriptions from the provided audio input, generated and processed by the Inputstream Generator.\n\n### Documentation\n\nDocumentation can be found [here](https://nico-byte.github.io/whisper-web/).\n\n## Deployment\n\nThis document provides comprehensive instructions for deploying the Whisper Web transcription system using Docker containers with both CPU and CUDA GPU support.\n\n### Overview\n\nThe Whisper Web system consists of five main components:\n- **Whisper Server (CPU/CUDA)**: Core transcription service with WebSocket support, available in both CPU and CUDA variants\n- **Streamlit Upload App**: Web interface for file uploads and WebRTC streaming\n- **Streamlit Viewer App**: Read-only transcription viewer\n- **CLI Client**: Command-line client with microphone access\n- **Docker Compose Profiles**: Flexible deployment configurations for different use cases\n\n### Project Structure\n\n```\nwhisper-web/\n├── app/\n│   ├── __init__.py\n│   ├── cli.py\n|   ├── helper.py\n│   ├── server.py\n│   ├── streamlit_upload_client.py\n│   └── streamlit_viewer_client.py\n├── docker/\n│   ├── DOCKERFILE.server.cpu\n│   ├── DOCKERFILE.server.cuda\n│   ├── DOCKERFILE.streamlit_upload\n│   └── DOCKERFILE.streamlit_viewer\n├── tests/\n│   ├── test_api_pytest.py\n│   ├── test_multi_client_pytest.py\n├── whisper_web/\n│   └── [core package files]\n├── .dockerignore\n├── .env\n├── .env.example\n├── .gitignore\n├── .python-version\n├── docker-compose.yml\n├── install_uv.sh\n├── LICENSE\n├── Makefile\n├── pyproject.toml\n├── pytest.ini\n├── README.md\n├── ruff.toml\n├── setup.sh\n└── uv.lock\n```\n\n## Environment Configuration\n\n### Environment Variables Template\nCreate a `.env` file in the project root with the following variables, if not done via setup.sh:\n\n```env\n# Server Configuration\nSERVER_TYPE=cpu  # or 'cuda' for GPU acceleration\nHOST=0.0.0.0\nCUDA_VISIBLE_DEVICES=0\nHF_HOME=./.models\n\n# Streamlit variables\nSTREAMLIT_SERVER_HEADLESS=true\nSTREAMLIT_SERVER_ADDRESS=${HOST}\nSTREAMLIT_BROWSER_GATHER_USAGE_STATS=false\nSTREAMLIT_UPLOAD_PORT=8501\nSTREAMLIT_VIEWER_PORT=8502\n\n# Network Configuration\nDOCKER_NETWORK_NAME=whisper-web\n```\n\n### Makefile Tasks\nThe project includes a comprehensive Makefile for common operations:\n\n```makefile\n# Development tasks\nmake format         # Format code with ruff\nmake lint           # Check code with ruff\nmake lintfix        # Fix linting issues\nmake pytest         # Run tests\n\n# Docker tasks\nmake docker.up.server-cpu-streamlit   # Run cpu server + streamlit apps\nmake docker.up.server-cuda-streamlit  # Run cuda server ++ streamlit apps\nmake docker.build.server-cpu          # Build cpu server\nmake docker.build.server-cuda         # Build cuda server\nmake docker.build.streamlit           # Build streamlit apps\nmake docker.build.all-cpu             # Build all images (cpu server)\nmake docker.build.all-cuda            # Build all images (cuda server)\n\n# Local development tasks\nmake local.server     # Run server locally\nmake local.cli        # Run CLI locally\nmake local.viewer     # Run viewer locally\nmake local.upload     # Run upload app locally\nmake local.run.cli    # Run server + CLI + streamlit viewer app\nmake local.run.upload # Run server + streamlit upload app\n```\n\n## Deployment Instructions\n\n### Prerequisites (Docker)\n- Docker 20.10+ with Docker Compose V2\n- At least 4GB RAM available (8GB+ recommended)\n- For GPU support: NVIDIA Docker runtime and CUDA-compatible GPU\n- Python 3.12 (for local development)\n- uv package manager (recommended)\n\n### Quick Start\n\n1. **Prepare the environment:**\n```bash\ncp .env.example .env\n# Edit .env file to configure your deployment\n```\n\n2. **Build all Docker images:**\n```bash\nmake docker.build.all-cpu # or ...all-cuda if driver is present\n```\n\n3. **Start services based on your hardware:**\n\n**CPU-only deployment with Streamlit interface:**\n```bash\nmake docker.up.server-cpu-streamlit\n```\n\n**GPU-accelerated deployment (requires NVIDIA GPU):**\n```bash\nmake docker.up.server-cuda-streamlit\n```\n\n1. **Access the applications:**\n- **Whisper Web Server**: http://localhost:8000\n- **API Documentation**: http://localhost:8000/docs\n- **Streamlit Upload App**: http://localhost:8501 \n- **Streamlit Viewer App**: http://localhost:8502\n\n### Individual Service Management\n\n```bash\n# Start only CPU server\nSERVER_TYPE=cpu docker compose --profile cpu up -d\n\n# Start CUDA server + Streamlit apps\nSERVER_TYPE=cuda docker compose --profile cuda --profile streamlit up -d\n```\n\n### Profile-Based Deployments\n\nThe docker-compose configuration uses profiles for flexible deployments:\n\n- `cpu`: CPU-only server\n- `cuda`: GPU-accelerated server  \n- `streamlit`: Streamlit web interfaces\n\n**Common deployment combinations:**\n```bash\n# Web interface with CPU server\nSERVER_TYPE=cpu docker compose --profile cpu --profile streamlit up -d\n\n# Web interface with GPU server\nSERVER_TYPE=cuda docker compose --profile cuda --profile streamlit up -d\n```\n\n### Local Development (without Docker)\n\nFor development, you can run services locally using uv:\n\n```bash\n# Install dependencies\nuv sync\n\n# Start individual services\nmake local.server    # Whisper Web server\nmake local.cli       # CLI client\nmake local.upload    # Streamlit upload app\nmake local.viewer    # Streamlit viewer app\n\n# Start combined services\nmake local.run.cli     # Server + CLI + Viewer\nmake local.run.upload  # Server + Upload app\n```\n\n## Troubleshooting\n\n### Build Issues\n\n**CUDA build failures:**\n```bash\n# Verify NVIDIA Docker runtime\ndocker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu22.04 nvidia-smi\n\n# Check CUDA availability\nnvidia-smi\n```\n\n### Service Status and Connectivity\n\n1. **Check service status:**\n```bash\ndocker compose ps\ncurl -f http://localhost:8000/status\n```\n\n2. **Network connectivity issues:**\n```bash\n# Test internal network connectivity\ndocker compose exec streamlit-upload curl -f http://server-cpu:8000/status\n\n# Check port bindings\ndocker compose port server-cpu 8000\n```\n\n### Common Issues\n\n1. **Models not downloading:** \n   - Check internet connection and disk space\n   - Verify `.models` directory permissions\n   - Check HuggingFace Hub connectivity\n\n2. **GPU not detected:** \n   - Verify NVIDIA Docker runtime installation\n   - Check CUDA_VISIBLE_DEVICES environment variable\n   - Ensure GPU has sufficient memory (4GB+ recommended)\n\n3. **Streamlit apps not loading:** \n   - Verify server status and connectivity\n   - Check firewall settings for ports 8501/8502\n\n4. **Port conflicts:** \n   - Modify port mappings in docker-compose.yml or .env file\n   - Check for other services using ports 8000, 8501, 8502\n\n5. **Memory issues:**\n   - Increase Docker memory limits\n   - Use smaller Whisper models (tiny, base, small)\n   - Monitor container resource usage with `docker stats`\n\n## Monitoring and Maintenance\n\n### Health Checks\nAll services include comprehensive health checks:\n\n**Server Health:**\n```bash\n# Check server status\ncurl -f http://localhost:8000/status\ncurl -f http://localhost:8000/sessions\n\n# Monitor via Docker\ndocker compose ps\ndocker compose logs -f server-cpu  # or server-cuda\n```\n\n**Streamlit App Health:**\n```bash\n# Check Streamlit health endpoints\ncurl -f http://localhost:8501/_stcore/health\ncurl -f http://localhost:8502/_stcore/health\n```\n\n**System Resource Monitoring:**\n```bash\n# Monitor container resource usage\ndocker stats\n\n# Check disk usage for models\ndu -sh .models/\ndocker system df\n```\n\n## Security Considerations\n\n1. **Network isolation:** Services communicate through internal Docker network\n2. **Volume mounts:** Only necessary directories are mounted\n3. **User permissions:** Client runs as non-root user when possible\n\nThis deployment provides a complete, scalable solution for the Whisper Web transcription system with proper containerization and networking.\n\n## TODOs\n  - Fix runtime error when using multilingual Whisper models:\n    RuntimeError: Tried to instantiate class '__path__._path', but it does not exist!\n    Ensure the model is correctly registered with torch::class_.\n    Also resolve input shape errors like\n    Whisper expects the mel input features to be of length 3000, but found 2500\n    by padding mel features to 3000, only important for multilingual model support.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnico-byte%2Fwhisper-web","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnico-byte%2Fwhisper-web","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnico-byte%2Fwhisper-web/lists"}