{"id":49028902,"url":"https://github.com/TheStageAI/TheWhisper","last_synced_at":"2026-04-28T23:00:49.329Z","repository":{"id":321310128,"uuid":"1079394550","full_name":"TheStageAI/TheWhisper","owner":"TheStageAI","description":"Optimized Whisper models for streaming and on-device use","archived":false,"fork":false,"pushed_at":"2026-04-23T14:01:28.000Z","size":2657,"stargazers_count":827,"open_issues_count":2,"forks_count":55,"subscribers_count":11,"default_branch":"main","last_synced_at":"2026-04-23T15:13:41.521Z","etag":null,"topics":["apple-silicon","coreml","mlx","nvidia-gpu","on-device-ai","real-time","speech-recognition","speech-to-text","streaming","transcription","translation","voice","voice-ai"],"latest_commit_sha":null,"homepage":"https://thestage.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TheStageAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-19T18:07:02.000Z","updated_at":"2026-04-23T14:01:43.000Z","dependencies_parsed_at":"2025-10-29T01:25:54.965Z","dependency_job_id":"43e55370-1e38-4921-9820-55a1d15debfa","html_url":"https://github.com/TheStageAI/TheWhisper","commit_stats":null,"previous_names":["thestageai/thewhisper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TheStageAI/TheWhisper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheStageAI%2FTheWhisper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheStageAI%2FTheWhisper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheStageAI%2FTheWhisper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheStageAI%2FTheWhisper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TheStageAI","download_url":"https://codeload.github.com/TheStageAI/TheWhisper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheStageAI%2FTheWhisper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32402673,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T19:38:08.556Z","status":"ssl_error","status_checked_at":"2026-04-28T19:37:55.688Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","coreml","mlx","nvidia-gpu","on-device-ai","real-time","speech-recognition","speech-to-text","streaming","transcription","translation","voice","voice-ai"],"created_at":"2026-04-19T09:00:36.931Z","updated_at":"2026-04-28T23:00:49.320Z","avatar_url":"https://github.com/TheStageAI.png","language":"Python","funding_links":[],"categories":["Developer Tools","Audio \u0026 Speech"],"sub_categories":["Python Tools"],"readme":"# TheWhisper: High-Performance Speech-to-Text\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Hugging Face](https://img.shields.io/badge/🤗-Hugging%20Face%20Weights-yellow)](https://huggingface.co/TheStageAI/thewhisper-large-v3-turbo/)\n[![NVIDIA](https://img.shields.io/badge/NVIDIA-GPU-green.svg)](#usage-deployment)\n[![Apple Silicon](https://img.shields.io/badge/Apple-Silicon-black.svg)](#usage-deployment)\n\n\n\u003cimg width=\"1420\" height=\"939\" alt=\"Frame 339234 (2)\" src=\"https://github.com/user-attachments/assets/e4549998-9d83-4980-bf53-cd21d40e9bce\" /\u003e\n\n\n\n## 🚀 Overview\n\nThis repository aims to share and develop the most efficient speech-to-text and text-to-speech inference solution -with a strong focus on self-hosting, cloud hosting, and on-device inference across multiple devices. \n\nFor the first release this repository provides **open-source transcription models** with **streaming inference support** and:\n- Hugging Face open weights for whisper models with a flexible chunk size (original models have 30s)\n- High-performance TheStage AI inference engines (NVIDIA GPU), 220 tok/s on L40s for whisper-large-v3 model\n- CoreML engines for macOS / Apple Silicon with the lowest in the world power consumption for MacOS\n- Local RestAPI with frontend examples using JS and Electron [see for details](electron_app/README.md)\n- Electron demo app built by TheStage AI (Certified by Apple): [TheNotes for macOS](https://cdn.thestage.ai/production/cms_file_upload/1761746543-88b5430a-5897-4348-b031-8a1101352c72/The%20Notes.pkg)\n- [Tutorial](https://app.thestage.ai/blog/Building-a-macOS-Note-Taker-app-on-Electron-with-TheWhisper?id=6) on building local note-taking app for macOS using  Electron and TheWhisper\n\nhttps://github.com/user-attachments/assets/f4d3fe7b-e2c5-42ff-a5d0-fef6afd11684\n\nIt is optimized for **low-latency**, **low power usage**, and **scalable** streaming transcription. Ideal for real-time captioning, live meetings, voice interfaces, and edge deployments.\n\n\u003c!-- \u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003e📖 Table of Contents\u003c/strong\u003e\u003c/summary\u003e --\u003e\n## 📖 Table of Contents\n- [✨ Features](#-features)\n- [⚡ Quick Start](#-quick-start)\n- [🛠️ Support Matrix](#%EF%B8%8F-support-matrix-and-system-requirements)\n- [💡 Usage](#%EF%B8%8F-usage-and-deployment)\n- [🖥️ Build On-Device Desktop Application for Apple](#-build-on-device-desktop-application-for-apple)\n- [📊 Benchmarks](#-benchmarks)\n- [🏢 Enterprise License Summary](#-enterprise-license-summary)\n- [🧭 Development Status](#-development-status)\n- [📝 Changelog](#-changelog-high-level)\n- [🙌 Acknowledgements](#-acknowledgements)\n\n\u003c!-- \u003c/details\u003e --\u003e\n\n---\n\n## ✨ Features\n\n- Open weights fine-tuned versions of Whisper models\n- Fine-tuned models support inference with 10s, 15s, 20s and 30s\n- CoreML engines for macOS and Apple Silicon, ~2W of power consumption, ~2GB RAM usage\n- Optimized engines for NVIDIA GPUs through TheStage AI [ElasticModels](https://docs.thestage.ai/elastic_models/docs/source/index.html) (free for small orgs)\n- Streaming implementation (NVIDIA + macOS)\n- Benchmarks: latency, memory, power, and ASR accuracy (OpenASR)\n- Simple Python API, examples and [tutorial](https://app.thestage.ai/blog/Building-a-macOS-Note-Taker-app-on-Electron-with-TheWhisper?id=6) of deployment for MacOS desktop app with Electron and ReactJS\n\n\u003cimg width=\"1547\" height=\"877\" alt=\"apple m2 whisper (4)\" src=\"https://github.com/user-attachments/assets/9404cdc0-b120-4ba1-9c65-4d42089ba623\" /\u003e\n\u003cimg width=\"1547\" height=\"877\" alt=\"nvidia l40s (2)\" src=\"https://cdn.thestage.ai/production/cms_file_upload/1770235593-c873f699-07af-497b-ac77-2f5b08e3f767/NVIDIA, H100 (2).png\" /\u003e\n\u003c!-- \u003cimg width=\"1547\" height=\"877\" alt=\"nvidia l40s (2)\" src=\"https://github.com/user-attachments/assets/7c318bb6-cbd6-42ce-b42f-096cd7a1070c\" /\u003e --\u003e\n\nFor comprehensive performance and quality benchmarks see [benchmark/](benchmark/README.md).\n\n---\n\n\n## 📦 Quick start\n\n### Clone the repository\n```bash\ngit clone https://github.com/TheStageAI/TheWhisper.git\ncd TheWhisper\n```\n### Install for Apple\n```bash\npip install .[apple]\n```\n\n### Install for Nvidia\n```bash\npip install .[nvidia]\n```\n\n### Install for Nvidia with TheStage AI optmized engines\n```bash\npip install 'thestage-elastic-models[nvidia]==0.1.7' --index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-production/simple --extra-index-url https://pypi.nvidia.com --extra-index-url https://pypi.org/simple\npip install .[nvidia]\npip install thestage\n```\n\n### Install for Jetson-Thor with TheStage AI optmized engines\n\nMake sure you have `tensorrt==10.13.3.9` installed on your jetson and run:\n\n```bash\npip install thestage-elastic-models[thor]==0.1.7 --extra-index-url https://thestage.jfrog.io/artifactory/api/pypi/pypi-thestage-ai-jetson-thor/simple -i https://pypi.jetson-ai-lab.io/sbsa/cu130/+simple/ --extra-index-url https://pypi.org\npip install .\npip install thestage\n```\n\n\nThen generate access token on [TheStage AI Platform](https://app.thestage.ai) in your profile and execute the following command:\n```bash\nthestage config set -t \u003cYOUR_API_TOKEN\u003e\n```\n-----\n\n## 🏗️ Support Matrix and System Requirements\n\n| **Feature** | **whisper-large-v3 (Nvidia)** | **whisper-large-v3 (Apple)** | **whisper-large-v3-turbo (Nvidia)** | **whisper-large-v3-turbo (Apple)** |\n| --- | --- | --- | --- | --- |\n| Streaming | ✅ | ✅ | ✅ | ✅ |\n| Accelerated | ✅ | ✅ | ✅ | ✅ |\n| Word Timestamps | ✅ | ✅ | ✅ | ✅ |\n| Multilingual | ✅ | ✅ | ✅ | ✅ |\n| 10s Chunk Mode | ✅ | ✅ | ✅ | ✅ |\n| 15s Chunk Mode | ✅ | ✅ | ✅ | ✅ |\n| 20s Chunk Mode | ✅ | ✅ | ✅ | ✅ |\n| 30s Chunk Mode | ✅ | ✅ | ✅ | ✅ |\n\n### Nvidia GPU Requirements\n\n- **Supported GPUs:** RTX 4090, RTX 5090, L40s, H100, A100, Jetson-Thor\n- **Operating System:** Ubuntu 20.04+\n- **Minimum RAM:** 2.5 GB (5 GB recommended for large-v3 model)\n- **CUDA Version:** 11.8 or higher\n- **Driver Version:** 520.0 or higher\n- **Python version**: 3.10-3.12\n\n### Apple Silicon Requirements\n\n- **Supported Chipsets:** M1, M1 Pro, M1 Max, M1 Ultra, M2, M2 Pro, M2 Max, M2 Ultra, M3, M3 Pro, M3 Max, M4, M4 Pro, M4 Max\n- **Operating System:** macOS 15.0 (Ventura) or later, iOS 18.0 or later\n- **Minimum RAM:** 2 GB (4 GB recommended for large-v3 model)\n- **Python version**: 3.10-3.12\n\n---\n\n## ▶️ Usage and Deployment\n\n### Apple Usage\n\n```python\nimport torch\nfrom thestage_speechkit.apple import ASRPipeline\n\nmodel = ASRPipeline(\n    model='TheStageAI/thewhisper-large-v3-turbo',\n    # optimized model with ANNA\n    model_size='S',\n    chunk_length_s=10\n)\n\n# inference\nresult = model(\n    \"path_to_your_audio.wav\", \n    return_timestamps=\"word\"\n)\n\nprint(result[\"text\"])\n```\n\n### Apple Usage with Streaming\n\n```python\nfrom thestage_speechkit.streaming import StreamingPipeline, MicStream, FileStream, StdoutStream\n\nstreaming_pipe = StreamingPipeline(\n    model='TheStageAI/thewhisper-large-v3-turbo',\n    # Optimized model by ANNA\n    model_size='S',\n    # Window length\n    chunk_length_s=10,\n    platform='apple',\n    language='en'\n)\n\n# set stride in miliseconds\nmic_stream = MicStream(step_size_s=0.5)\noutput_stream = StdoutStream()\n\nwhile True:\n    chunk = mic_stream.next_chunk()\n    if chunk is not None:\n        approved_text, assumption = streaming_pipe(chunk)\n        output_stream.write(approved_text, assumption)\n    else:\n        break\n```\n\n### Nvidia Usage (HuggingFace Transfomers)\n\n```python\nimport torch\nfrom thestage_speechkit.nvidia import ASRPipeline\n\nmodel = ASRPipeline(\n    model='TheStageAI/thewhisper-large-v3-turbo',\n    # allowed: 10s, 15s, 20s, 30s\n    chunk_length_s=10,\n    # optimized TheStage AI engines\n    batch_size=32,\n    device='cuda'\n)\n\n# inference\nresult = model(\n    \"path_to_your_audio.wav\", \n    chunk_length_s=10,\n    generate_kwargs={'do_sample': False, 'use_cache': True}\n)\n\nprint(result[\"text\"])\n```\n\n### Nvidia Usage (TheStage AI engines)\n\n```python\nimport torch\nfrom thestage_speechkit.nvidia import ASRPipeline\n\nmodel = ASRPipeline(\n    model='TheStageAI/thewhisper-large-v3-turbo',\n    # allowed: 10s, 15s, 20s, 30s\n    chunk_length_s=10,\n    # optimized TheStage AI engines\n    model_size='S',\n    batch_size=32,\n    device='cuda'\n)\n\n# inference\nresult = model(\n    \"path_to_your_audio.wav\", \n    chunk_length_s=10,\n    generate_kwargs={'do_sample': False, 'use_cache': True}\n)\n\nprint(result[\"text\"])\n```\n-----\n\n## 💻 Build On-Device Desktop Application for Apple\n\nYou can build a macOS desktop app with real-time transcription. Find a simple ReactJS application here: **Link to React Frontend**\nYou can also download our app built using this backend here: [TheNotes for macOS](https://cdn.thestage.ai/production/cms_file_upload/1761693601-8ef0605f-a2e0-4bef-97c1-b61452e4f7dc/The%20Notes%20Package%20Oct%2028%202025.pkg)\n\n-----\n\n## 📊 Benchmarks\n\nTheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).\n\nFor comprehensive quality and performance benchmarks, including comparisons with other Whisper inference solutions, please refer to the [benchmark/](benchmark/README.md) directory.\n\n\u003cimg width=\"1547\" height=\"531\" alt=\"vanilla whisper (1)\" src=\"https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9\" /\u003e\n\u003cimg width=\"1547\" height=\"458\" alt=\"TheStage AI Whisper (1)\" src=\"https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65\" /\u003e\n\n\u003cimg width=\"1547\" height=\"531\" alt=\"Open ASR Leaderboard Benchmark\" src=\"https://cdn.thestage.ai/production/cms_file_upload/1770139173-58663708-4644-44a7-8225-763c33a2c95b/SOTA on Multilingual Open ASR benchmark.png\" /\u003e\n\u003cimg width=\"1547\" height=\"531\" alt=\"Multilingual Benchmark\" src=\"https://cdn.thestage.ai/production/cms_file_upload/1770139254-33d3c626-158f-42ec-a7f2-3a4bbe44b382/Open ASR Leaderboard Benchmark.png\" /\u003e\n\n---\n\n## 🏢 Enterprise License Summary\n\nTo get commercial license for bigger number of GPUs to use TheStage AI optimized engines please contact us here: [Service request](https://app.thestage.ai/contact)\n\n| Platform                 | Engine Type               | Status     | License                                 |\n|--------------------------|---------------------------|------------|-----------------------------------------|\n| NVIDIA GPUs (CUDA)       | Pytorch HF Transformers | ✅ Stable  | Free                                    |\n| macOS / Apple Silicon    | CoreML Engine + MLX     | ✅ Stable  | Free                                    |\n| NVIDIA GPUs (CUDA)       | TheStage AI (Optimized) | ✅ Stable  | Free ≤ 4 GPUs/year for small orgs       |\n\n----\n\n## 🧭 Development Status\n\n✅ OpenASR WER [benchmark](benchmark/README.md) for multiple chunk sizes\n\n✅ Performance [benchmark](benchmark/README.md) for NVIDIA\n\n✅ Support for L40S, H100, RTX 4090, RTX 5090\n\n✅ Time-stamp support on Nvidia\n\n✅ Nvidia Jetson support\n\n☐ Streaming containers for Nvidia\n\n☐ Ready-to-go containers for inference on Nvidia GPUs with OpenAI compatible API\n\n☐ Speaker diarization and speaker identification\n\n----\n\n## 🙌 Acknowledgements\n\n- **Silero VAD**: Used for voice activity detection in `thestage_speechkit/vad.py`. See [@snakers4](https://github.com/snakers4/silero-vad).\n- **OpenAI Whisper**: Original Whisper model and pretrained checkpoints. See [@openai](https://github.com/openai/whisper).\n- **Hugging Face Transformers**: Model, tokenizer, and inference utilities. See [@transformers](https://github.com/huggingface/transformers).\n- **MLX community**: MLX Whisper implementation for Apple Silicon. See [@mlx-explore](https://github.com/ml-explore/mlx-examples/tree/main/whisper).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTheStageAI%2FTheWhisper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTheStageAI%2FTheWhisper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTheStageAI%2FTheWhisper/lists"}