{"id":19436290,"url":"https://github.com/aixerum/faster-whisper","last_synced_at":"2026-05-14T18:07:30.363Z","repository":{"id":258531557,"uuid":"874963555","full_name":"AIXerum/faster-whisper","owner":"AIXerum","description":"faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.  This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.","archived":false,"fork":false,"pushed_at":"2024-10-18T19:43:31.000Z","size":2926,"stargazers_count":16,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-17T18:06:49.639Z","etag":null,"topics":["ctranslate2","gpu","transcription","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AIXerum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-18T19:43:27.000Z","updated_at":"2025-11-14T02:05:32.000Z","dependencies_parsed_at":"2024-10-19T12:56:39.752Z","dependency_job_id":null,"html_url":"https://github.com/AIXerum/faster-whisper","commit_stats":null,"previous_names":["aixerum/faster-whisper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AIXerum/faster-whisper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIXerum%2Ffaster-whisper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIXerum%2Ffaster-whisper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIXerum%2Ffaster-whisper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIXerum%2Ffaster-whisper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AIXerum","download_url":"https://codeload.github.com/AIXerum/faster-whisper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIXerum%2Ffaster-whisper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33037105,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ctranslate2","gpu","transcription","whisper"],"created_at":"2024-11-10T15:10:26.939Z","updated_at":"2026-05-14T18:07:30.346Z","avatar_url":"https://github.com/AIXerum.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CI](https://github.com/guillaumekln/faster-whisper/workflows/CI/badge.svg)](https://github.com/guillaumekln/faster-whisper/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/faster-whisper.svg)](https://badge.fury.io/py/faster-whisper)\n\n# Faster Whisper transcription with CTranslate2\n\n**faster-whisper** is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models.\n\nThis implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.\n\n## Benchmark\n\n### Whisper\n\nFor reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations:\n\n* [openai/whisper](https://github.com/openai/whisper)@[6dea21fd](https://github.com/openai/whisper/commit/6dea21fd7f7253bfe450f1e2512a0fe47ee2d258)\n* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[3b010f9](https://github.com/ggerganov/whisper.cpp/commit/3b010f9bed9a6068609e9faf52383aea792b0362)\n* [faster-whisper](https://github.com/guillaumekln/faster-whisper)@[cce6b53e](https://github.com/guillaumekln/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e)\n\n### Large-v2 model on GPU\n\n| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |\n| --- | --- | --- | --- | --- | --- |\n| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |\n| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |\n| faster-whisper | int8 | 5 | 59s | 3091MB | 3117MB |\n\n*Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.*\n\n### Small model on CPU\n\n| Implementation | Precision | Beam size | Time | Max. memory |\n| --- | --- | --- | --- | --- |\n| openai/whisper | fp32 | 5 | 10m31s | 3101MB |\n| whisper.cpp | fp32 | 5 | 17m42s | 1581MB |\n| whisper.cpp | fp16 | 5 | 12m39s | 873MB |\n| faster-whisper | fp32 | 5 | 2m44s | 1675MB |\n| faster-whisper | int8 | 5 | 2m04s | 995MB |\n\n*Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.*\n\n\n### Distil-whisper\n\n| Implementation | Precision | Beam size | Time | Gigaspeech WER |\n| --- | --- | --- | --- | --- |\n| distil-whisper/distil-large-v2 | fp16 | 4 |- | 10.36 |\n| [faster-distil-large-v2](https://huggingface.co/Systran/faster-distil-whisper-large-v2) | fp16 | 5 | - | 10.28 |\n| distil-whisper/distil-medium.en | fp16 | 4 | - | 11.21 |\n| [faster-distil-medium.en](https://huggingface.co/Systran/faster-distil-whisper-medium.en) | fp16 | 5 | - | 11.21 |\n\n*Executed with CUDA 11.4 on a NVIDIA 3090.*\n\n\u003cdetails\u003e\n\u003csummary\u003etesting details (click to expand)\u003c/summary\u003e\n\nFor `distil-whisper/distil-large-v2`, the WER is tested with code sample from [link](https://huggingface.co/distil-whisper/distil-large-v2#evaluation). for `faster-distil-whisper`, the WER is tested with setting:\n```python\nfrom faster_whisper import WhisperModel\n\nmodel_size = \"distil-large-v2\"\n# model_size = \"distil-medium.en\"\n# Run on GPU with FP16\nmodel = WhisperModel(model_size, device=\"cuda\", compute_type=\"float16\")\nsegments, info = model.transcribe(\"audio.mp3\", beam_size=5, language=\"en\")\n```\n\u003c/details\u003e\n\n## Requirements\n\n* Python 3.8 or greater\n\nUnlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package.\n\n### GPU\n\nGPU execution requires the following NVIDIA libraries to be installed:\n\n* [cuBLAS for CUDA 11](https://developer.nvidia.com/cublas)\n* [cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn)\n\nThere are multiple ways to install these libraries. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below.\n\n\u003cdetails\u003e\n\u003csummary\u003eOther installation methods (click to expand)\u003c/summary\u003e\n\n#### Use Docker\n\nThe libraries are installed in this official NVIDIA Docker image: `nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04`.\n\n#### Install with `pip` (Linux only)\n\nOn Linux these libraries can be installed with `pip`. Note that `LD_LIBRARY_PATH` must be set before launching Python.\n\n```bash\npip install nvidia-cublas-cu11 nvidia-cudnn-cu11\n\nexport LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + \":\" + os.path.dirname(nvidia.cudnn.lib.__file__))'`\n```\n\n#### Download the libraries from Purfview's repository (Windows \u0026 Linux)\n\nPurfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA libraries for Windows \u0026 Linux in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs). Decompress the archive and place the libraries in a directory included in the `PATH`.\n\n\u003c/details\u003e\n\n## Installation\n\nThe module can be installed from [PyPI](https://pypi.org/project/faster-whisper/):\n\n```bash\npip install faster-whisper\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eOther installation methods (click to expand)\u003c/summary\u003e\n\n### Install the master branch\n\n```bash\npip install --force-reinstall \"faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz\"\n```\n\n### Install a specific commit\n\n```bash\npip install --force-reinstall \"faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz\"\n```\n\n\u003c/details\u003e\n\n## Usage\n\n### Faster-whisper\n\n```python\nfrom faster_whisper import WhisperModel\n\nmodel_size = \"large-v3\"\n\n# Run on GPU with FP16\nmodel = WhisperModel(model_size, device=\"cuda\", compute_type=\"float16\")\n\n# or run on GPU with INT8\n# model = WhisperModel(model_size, device=\"cuda\", compute_type=\"int8_float16\")\n# or run on CPU with INT8\n# model = WhisperModel(model_size, device=\"cpu\", compute_type=\"int8\")\n\nsegments, info = model.transcribe(\"audio.mp3\", beam_size=5)\n\nprint(\"Detected language '%s' with probability %f\" % (info.language, info.language_probability))\n\nfor segment in segments:\n    print(\"[%.2fs -\u003e %.2fs] %s\" % (segment.start, segment.end, segment.text))\n```\n\n**Warning:** `segments` is a *generator* so the transcription only starts when you iterate over it. The transcription can be run to completion by gathering the segments in a list or a `for` loop:\n\n```python\nsegments, _ = model.transcribe(\"audio.mp3\")\nsegments = list(segments)  # The transcription will actually run here.\n```\n### Faster-distil-whisper\nFor usage of `faster-ditil-whisper`, please refer to: https://github.com/guillaumekln/faster-whisper/issues/533\n\n```python\nmodel_size = \"distil-large-v2\"\n# model_size = \"distil-medium.en\"\nmodel = WhisperModel(model_size, device=\"cuda\", compute_type=\"float16\")\nsegments, info = model.transcribe(\"audio.mp3\", beam_size=5, \n    language=\"en\", max_new_tokens=128, condition_on_previous_text=False)\n\n```\nNOTE: emprically, `condition_on_previous_text=True` will degrade the performance of `faster-distil-whisper` for long audio. Degradation on the first chunk was observed with `initial_prompt` too.\n\n### Word-level timestamps\n\n```python\nsegments, _ = model.transcribe(\"audio.mp3\", word_timestamps=True)\n\nfor segment in segments:\n    for word in segment.words:\n        print(\"[%.2fs -\u003e %.2fs] %s\" % (word.start, word.end, word.word))\n```\n\n### VAD filter\n\nThe library integrates the [Silero VAD](https://github.com/snakers4/silero-vad) model to filter out parts of the audio without speech:\n\n```python\nsegments, _ = model.transcribe(\"audio.mp3\", vad_filter=True)\n```\n\nThe default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the [source code](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py). They can be customized with the dictionary argument `vad_parameters`:\n\n```python\nsegments, _ = model.transcribe(\n    \"audio.mp3\",\n    vad_filter=True,\n    vad_parameters=dict(min_silence_duration_ms=500),\n)\n```\n\n### Logging\n\nThe library logging level can be configured like this:\n\n```python\nimport logging\n\nlogging.basicConfig()\nlogging.getLogger(\"faster_whisper\").setLevel(logging.DEBUG)\n```\n\n### Going further\n\nSee more model and transcription options in the [`WhisperModel`](https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/transcribe.py) class implementation.\n\n## Community integrations\n\nHere is a non exhaustive list of open-source projects using faster-whisper. Feel free to add your project to the list!\n\n* [whisper-ctranslate2](https://github.com/Softcatala/whisper-ctranslate2) is a command line client based on faster-whisper and compatible with the original client from openai/whisper.\n* [whisper-diarize](https://github.com/MahmoudAshraf97/whisper-diarization) is a speaker diarization tool that is based on faster-whisper and NVIDIA NeMo.\n* [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) Standalone CLI executables of faster-whisper for Windows, Linux \u0026 macOS. \n* [asr-sd-pipeline](https://github.com/hedrergudene/asr-sd-pipeline) provides a scalable, modular, end to end multi-speaker speech to text solution implemented using AzureML pipelines.\n* [Open-Lyrics](https://github.com/zh-plus/Open-Lyrics) is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into `.lrc` files in the desired language using OpenAI-GPT.\n* [wscribe](https://github.com/geekodour/wscribe) is a flexible transcript generation tool supporting faster-whisper, it can export word level transcript and the exported transcript then can be edited with [wscribe-editor](https://github.com/geekodour/wscribe-editor)\n* [aTrain](https://github.com/BANDAS-Center/aTrain) is a graphical user interface implementation of faster-whisper developed at the BANDAS-Center at the University of Graz for transcription and diarization in Windows ([Windows Store App](https://apps.microsoft.com/detail/atrain/9N15Q44SZNS2)) and Linux.\n* [Whisper-Streaming](https://github.com/ufal/whisper_streaming) implements real-time mode for offline Whisper-like speech-to-text models with faster-whisper as the most recommended back-end. It implements a streaming policy with self-adaptive latency based on the actual source complexity, and demonstrates the state of the art.\n* [WhisperLive](https://github.com/collabora/WhisperLive) is a nearly-live implementation of OpenAI's Whisper which uses faster-whisper as the backend to transcribe audio in real-time.\n\n## Model conversion\n\nWhen loading a model from its size such as `WhisperModel(\"large-v3\")`, the correspondig CTranslate2 model is automatically downloaded from the [Hugging Face Hub](https://huggingface.co/Systran).\n\nWe also provide a script to convert any Whisper models compatible with the Transformers library. They could be the original OpenAI models or user fine-tuned models.\n\nFor example the command below converts the [original \"large-v3\" Whisper model](https://huggingface.co/openai/whisper-large-v3) and saves the weights in FP16:\n\n```bash\npip install transformers[torch]\u003e=4.23\n\nct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2\n--copy_files tokenizer.json preprocessor_config.json --quantization float16\n```\n\n* The option `--model` accepts a model name on the Hub or a path to a model directory.\n* If the option `--copy_files tokenizer.json` is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.\n\nModels can also be converted from the code. See the [conversion API](https://opennmt.net/CTranslate2/python/ctranslate2.converters.TransformersConverter.html).\n\n### Load a converted model\n\n1. Directly load the model from a local directory:\n```python\nmodel = faster_whisper.WhisperModel(\"whisper-large-v3-ct2\")\n```\n\n2. [Upload your model to the Hugging Face Hub](https://huggingface.co/docs/transformers/model_sharing#upload-with-the-web-interface) and load it from its name:\n```python\nmodel = faster_whisper.WhisperModel(\"username/whisper-large-v3-ct2\")\n```\n\n## Comparing performance against other implementations\n\nIf you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:\n\n* Verify that the same transcription options are used, especially the same beam size. For example in openai/whisper, `model.transcribe` uses a default beam size of 1 but here we use a default beam size of 5.\n* When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variable `OMP_NUM_THREADS`, which can be set when running your script:\n\n```bash\nOMP_NUM_THREADS=4 python3 my_script.py\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faixerum%2Ffaster-whisper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faixerum%2Ffaster-whisper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faixerum%2Ffaster-whisper/lists"}