{"id":13523412,"url":"https://github.com/Softcatala/whisper-ctranslate2","last_synced_at":"2025-04-01T00:31:33.936Z","repository":{"id":145744017,"uuid":"615270502","full_name":"Softcatala/whisper-ctranslate2","owner":"Softcatala","description":"Whisper command line client compatible with original OpenAI client based on CTranslate2.","archived":false,"fork":false,"pushed_at":"2024-10-25T20:42:13.000Z","size":1172,"stargazers_count":903,"open_issues_count":12,"forks_count":76,"subscribers_count":24,"default_branch":"main","last_synced_at":"2024-10-29T15:24:52.819Z","etag":null,"topics":["openai-","openai-whisper","speech-recognition","speech-to-text","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Softcatala.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-17T10:19:45.000Z","updated_at":"2024-10-29T01:58:47.000Z","dependencies_parsed_at":"2023-11-23T18:39:22.348Z","dependency_job_id":"5faab29a-2963-4862-a1ae-dbdde5ea0534","html_url":"https://github.com/Softcatala/whisper-ctranslate2","commit_stats":{"total_commits":192,"total_committers":7,"mean_commits":"27.428571428571427","dds":0.04166666666666663,"last_synced_commit":"2e4564cd3833dbbce6b1fe29cbcd16fcae40dbb4"},"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Softcatala%2Fwhisper-ctranslate2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Softcatala%2Fwhisper-ctranslate2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Softcatala%2Fwhisper-ctranslate2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Softcatala%2Fwhisper-ctranslate2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Softcatala","download_url":"https://codeload.github.com/Softcatala/whisper-ctranslate2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222688173,"owners_count":17023297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["openai-","openai-whisper","speech-recognition","speech-to-text","whisper"],"created_at":"2024-08-01T06:00:59.825Z","updated_at":"2025-04-01T00:31:33.922Z","avatar_url":"https://github.com/Softcatala.png","language":"Python","funding_links":[],"categories":["CLI tools","Python","Table of Contents","Audio","语音合成","Serving","Lengua Catalana y NLP"],"sub_categories":["Self-hosted","AI - Natural Language Processing","Speech-to-text","资源传输下载","Large Model Serving"],"readme":"[![PyPI version](https://img.shields.io/pypi/v/whisper-ctranslate2.svg?logo=pypi\u0026logoColor=FFE873)](https://pypi.org/project/whisper-ctranslate2/)\n[![PyPI downloads](https://img.shields.io/pypi/dm/whisper-ctranslate2.svg)](https://pypistats.org/packages/whisper-ctranslate2)\n\n# Introduction\n\nWhisper command line client compatible with original [OpenAI client](https://github.com/openai/whisper) based on CTranslate2.\n\nIt uses [CTranslate2](https://github.com/OpenNMT/CTranslate2/) and [Faster-whisper](https://github.com/SYSTRAN/faster-whisper) Whisper implementation that is up to 4 times faster than openai/whisper for the same accuracy while using less memory.\n\nGoals of the project:\n* Provide an easy way to use the CTranslate2 Whisper implementation\n* Ease the migration for people using OpenAI Whisper CLI\n\n# 🚀 **NEW PROJECT LAUNCHED!** 🚀\n\n**Open dubbing** is an AI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages ! 🎉\n\n### **🔥 Check it out now: [*open-dubbing*](https://github.com/jordimas/open-dubbing) 🔥**\n\n\n# Installation\n\nTo install the latest stable version, just type:\n\n    pip install -U whisper-ctranslate2\n\nAlternatively, if you are interested in the latest development (non-stable) version from this repository, just type:\n\n    pip install git+https://github.com/Softcatala/whisper-ctranslate2\n\n# Using prebuild Docker image\n\nYou can use build docker image. First pull the image:\n\n    docker pull ghcr.io/softcatala/whisper-ctranslate2:latest\n\nThe Docker image includes the small, medium\" and large-v2.\n\nTo run it:\n\n    docker run --gpus \"device=0\" \\\n        -v \"$(pwd)\":/srv/files/ \\\n        -it ghcr.io/softcatala/whisper-ctranslate2:latest \\\n        /srv/files/e2e-tests/gossos.mp3 \\\n        --output_dir /srv/files/\n    \nNotes:\n* _--gpus \"device=0\"_ gives access to the GPU. If you do not have a GPU, remove this.\n* _\"$(pwd)\":/srv/files/_ maps your current directory to /srv/files/ inside the container\n\n# CPU and GPU support\n\nGPU and CPU support are provided by [CTranslate2](https://github.com/OpenNMT/CTranslate2/).\n\nIt has compatibility with x86-64 and AArch64/ARM64 CPU and integrates multiple backends that are optimized for these platforms: Intel MKL, oneDNN, OpenBLAS, Ruy, and Apple Accelerate.\n\nGPU execution requires the NVIDIA libraries cuBLAS 11.x and cuDNN 8.x to be installed on the system. Please refer to the [CTranslate2 documentation](https://opennmt.net/CTranslate2/installation.html)\n\nBy default the best hardware available is selected for inference. You can use the options `--device` and `--device_index` to control manually the selection.\n    \n# Usage\n\nSame command line as OpenAI Whisper.\n\nTo transcribe:\n\n    whisper-ctranslate2 inaguracio2011.mp3 --model medium\n    \n\u003cimg alt=\"image\" src=\"https://user-images.githubusercontent.com/309265/226923541-8326c575-7f43-4bba-8235-2a4a8bdfb161.png\"\u003e\n\nTo translate:\n\n    whisper-ctranslate2 inaguracio2011.mp3 --model medium --task translate\n\n\u003cimg alt=\"image\" src=\"https://user-images.githubusercontent.com/309265/226923535-b6583536-2486-4127-b17b-c58d85cdb90f.png\"\u003e\n\nWhisper translate task translates the transcription from the source language to English (the only target language supported).\n\nAdditionally using:\n\n    whisper-ctranslate2 --help\n\nAll the supported options with their help are shown.\n\n# CTranslate2 specific options\n\nOn top of the OpenAI Whisper command line options, there are some specific options provided by CTranslate2 or whiper-ctranslate2.\n\n## Batched inference\n\nBatched inference transcribes each segment in-dependently which can provide an additional 2x-4x speed increase:\n\n    whisper-ctranslate2 inaguracio2011.mp3 --batched True\n    \nYou can additionally use the --batch_size to specify the maximum number of parallel requests to model for decoding.\n\nBatched inference uses Voice Activity Detection (VAD) filter and ignores the following paramters: compression_ratio_threshold, logprob_threshold,\nno_speech_threshold, condition_on_previous_text, prompt_reset_on_temperature, prefix, hallucination_silence_threshold.\n\n## Quantization\n\n`--compute_type` option which accepts _default,auto,int8,int8_float16,int16,float16,float32_ values indicates the type of [quantization](https://opennmt.net/CTranslate2/quantization.html) to use. On CPU _int8_ will give the best performance:\n\n    whisper-ctranslate2 myfile.mp3 --compute_type int8\n\n## Loading the model from a directory\n\n`--model_directory` option allows to specify the directory from which you want to load a CTranslate2 Whisper model. For example, if you want to load your own quantified [Whisper model](https://opennmt.net/CTranslate2/conversion.html) version or using your own [Whisper fine-tunned](https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event) version. The model must be in CTranslate2 format.\n\n## Using Voice Activity Detection (VAD) filter\n\n`--vad_filter` option enables the voice activity detection (VAD) to filter out parts of the audio without speech. This step uses the [Silero VAD model](https://github.com/snakers4/silero-vad):\n\n    whisper-ctranslate2 myfile.mp3 --vad_filter True\n\nThe VAD filter accepts multiple additional options to determine the filter behavior:\n\n    --vad_onset VALUE (float)\n\nProbabilities above this value are considered as speech.\n\n    --vad_min_speech_duration_ms (int)\n\nFinal speech chunks shorter min_speech_duration_ms are thrown out.\n\n    --vad_max_speech_duration_s VALUE (int)\n\nMaximum duration of speech chunks in seconds. Longer will be split at the timestamp of the last silence.\n\n\n## Print colors\n\n`--print_colors True` options prints the transcribed text using an experimental color coding strategy based on [whisper.cpp](https://github.com/ggerganov/whisper.cpp) to highlight words with high or low confidence:\n\n    whisper-ctranslate2 myfile.mp3 --print_colors True\n\n\u003cimg alt=\"image\" src=\"https://user-images.githubusercontent.com/309265/228054378-48ac6af4-ce4b-44da-b4ec-70ce9f2f2a6c.png\"\u003e\n\n## Live transcribe from your microphone\n\n`--live_transcribe True` option activates the live transcription mode from your microphone:\n\n    whisper-ctranslate2 --live_transcribe True --language en\n\nhttps://user-images.githubusercontent.com/309265/231533784-e58c4b92-e9fb-4256-b4cd-12f1864131d9.mov\n\n## Diarization (speaker identification)\n\nThere is experimental diarization support using [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) to identify speakers. At the moment, the support is at segment level.\n\nTo enable diarization you need to follow these steps:\n\n1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`\n2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions\n3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions\n4. Create an access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).\n\nAnd then execute passing the HuggingFace API token as parameter to enable diarization:\n\n    whisper-ctranslate2 --hf_token YOUR_HF_TOKEN\n\nand then the name of the speaker is added in the output files (e.g. JSON, VTT and STR files):\n\n_[SPEAKER_00]: There is a lot of people in this room_\n\nThe option `--speaker_name SPEAKER_NAME` allows to use your own string to identify the speaker.\n\n\n# Need help?\n\nCheck our [frequently asked questions](FAQ.md) for common questions.\n\n# Contact\n\nJordi Mas \u003cjmas@softcatala.org\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSoftcatala%2Fwhisper-ctranslate2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSoftcatala%2Fwhisper-ctranslate2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSoftcatala%2Fwhisper-ctranslate2/lists"}