{"id":13613129,"url":"https://github.com/Carleslc/AudioToText","last_synced_at":"2025-04-13T15:32:51.547Z","repository":{"id":88500291,"uuid":"598233871","full_name":"Carleslc/AudioToText","owner":"Carleslc","description":"Transcribe and translate audio to text using Whisper and DeepL.","archived":false,"fork":false,"pushed_at":"2024-01-22T13:53:28.000Z","size":20301,"stargazers_count":323,"open_issues_count":3,"forks_count":44,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-06T18:14:28.429Z","etag":null,"topics":["audio","audio-processing","captions","colab-notebook","deepl","ffmpeg","google-colab","jupyter-notebook","language","openai-whisper","python","speech-to-text","subtitles","text","transcribe","transcription","translate","translation","whisper","whisper-api"],"latest_commit_sha":null,"homepage":"https://carleslc.me/AudioToText","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Carleslc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"ko_fi":"carleslc"}},"created_at":"2023-02-06T17:25:05.000Z","updated_at":"2025-04-06T14:37:16.000Z","dependencies_parsed_at":"2024-10-27T00:58:09.991Z","dependency_job_id":null,"html_url":"https://github.com/Carleslc/AudioToText","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Carleslc%2FAudioToText","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Carleslc%2FAudioToText/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Carleslc%2FAudioToText/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Carleslc%2FAudioToText/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Carleslc","download_url":"https://codeload.github.com/Carleslc/AudioToText/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248736031,"owners_count":21153523,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-processing","captions","colab-notebook","deepl","ffmpeg","google-colab","jupyter-notebook","language","openai-whisper","python","speech-to-text","subtitles","text","transcribe","transcription","translate","translation","whisper","whisper-api"],"created_at":"2024-08-01T20:00:40.350Z","updated_at":"2025-04-13T15:32:51.539Z","avatar_url":"https://github.com/Carleslc.png","language":"Jupyter Notebook","funding_links":["https://ko-fi.com/carleslc"],"categories":["HarmonyOS","Jupyter Notebook"],"sub_categories":["Windows Manager"],"readme":"# AudioToText\n\n[![Google Colab Badge](https://img.shields.io/badge/Google%20Colab-F9AB00?logo=googlecolab\u0026logoColor=fff\u0026style=for-the-badge)](https://colab.research.google.com/github/Carleslc/AudioToText/blob/master/AudioToText.ipynb)\n\n[![ko-fi](https://www.ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/carleslc)\n\n**Transcribe** audio using [**Whisper**](https://github.com/openai/whisper) from [OpenAI](https://openai.com/).\n\n**Translate** audio using [**Whisper**](https://github.com/openai/whisper) and [**DeepL**](https://www.deepl.com/) translator.\n\nGenerate _captions_ using VTT or SRT file formats.\n\n[**Introducing Whisper** _(OpenAI Blog)_](https://openai.com/blog/whisper/)\n\n[🇪🇸 Vídeo sobre Whisper _(Dot CSV)_](https://www.youtube.com/watch?v=JuMEmF-2FsA)\n\n## How to use\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Carleslc/AudioToText/blob/master/AudioToText.ipynb)\n\n**Open [AudioToText in Google Colab](https://colab.research.google.com/github/Carleslc/AudioToText/blob/master/AudioToText.ipynb) and follow the step-by-step instructions.**\n\nA Cloud GPU will be assigned to you to run the notebook code to transcribe and translate your audio files.\n\nIf you want to run the code in your own computer check [_**local installation**_](#local-installation).\n\n## Features\n\n  - [**English transcription**](#audio-transcription-from-english-using-whisper)\n  - [**Non-English transcription**](#audio-transcription-from-almost-any-language-using-whisper)\n  - [**Any-to-English translation**](#audio-translation-to-english-using-whisper)\n\n- [**Any-to-Any\\* translation**](#audio-translation-using-deepl-translator)\n\n   Translate the transcriptions using [**DeepL**](https://www.deepl.com/) translator.\n\n   [_\\* See supported languages by DeepL_](https://support.deepl.com/hc/en-us/articles/360019925219-Languages-included-in-DeepL-Pro)\n\n- [Save transcriptions and captions in different formats](#save-transcripts-to-different-formats): TXT, VTT, SRT, TSV and JSON.\n\n- Choose between [open-source](https://github.com/openai/whisper) models or [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis#whisper-api).\n\n- [AudioToText CLI](#using-audiototext-cli) for [local usage](#local-installation).\n\nThere are several examples in the [**examples**](examples) folder.\n\n![Whisper Features](https://cdn.openai.com/whisper/draft-20220920a/asr-training-data-desktop.svg)\n\n### Audio **transcription** from English using Whisper\n\n`task`: `Transcribe`\n\n`language`: `English`\n\n### Audio **transcription** from [almost any language](https://github.com/openai/whisper#available-models-and-languages) using Whisper\n\n`task`: `Transcribe`\n\n`language`: `Auto-Detect` or select the source language of your audio file\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003ca\u003e\u003ci\u003eSupported source languages by Whisper\u003c/i\u003e\u003c/a\u003e\u003c/summary\u003e\n  \n  ```\n  Afrikaans\n  Albanian\n  Amharic\n  Arabic\n  Armenian\n  Assamese\n  Azerbaijani\n  Bashkir\n  Basque\n  Belarusian\n  Bengali\n  Bosnian\n  Breton\n  Bulgarian\n  Burmese\n  Castilian\n  Catalan\n  Chinese\n  Croatian\n  Czech\n  Danish\n  Dutch\n  English\n  Estonian\n  Faroese\n  Finnish\n  Flemish\n  French\n  Galician\n  Georgian\n  German\n  Greek\n  Gujarati\n  Haitian\n  Haitian Creole\n  Hausa\n  Hawaiian\n  Hebrew\n  Hindi\n  Hungarian\n  Icelandic\n  Indonesian\n  Italian\n  Japanese\n  Javanese\n  Kannada\n  Kazakh\n  Khmer\n  Korean\n  Lao\n  Latin\n  Latvian\n  Letzeburgesch\n  Lingala\n  Lithuanian\n  Luxembourgish\n  Macedonian\n  Malagasy\n  Malay\n  Malayalam\n  Maltese\n  Maori\n  Marathi\n  Moldavian\n  Moldovan\n  Mongolian\n  Myanmar\n  Nepali\n  Norwegian\n  Nynorsk\n  Occitan\n  Panjabi\n  Pashto\n  Persian\n  Polish\n  Portuguese\n  Punjabi\n  Pushto\n  Romanian\n  Russian\n  Sanskrit\n  Serbian\n  Shona\n  Sindhi\n  Sinhala\n  Sinhalese\n  Slovak\n  Slovenian\n  Somali\n  Spanish\n  Sundanese\n  Swahili\n  Swedish\n  Tagalog\n  Tajik\n  Tamil\n  Tatar\n  Telugu\n  Thai\n  Tibetan\n  Turkish\n  Turkmen\n  Ukrainian\n  Urdu\n  Uzbek\n  Valencian\n  Vietnamese\n  Welsh\n  Yiddish\n  Yoruba\n  ```\n  \n\u003c/details\u003e\n\n### Audio **translation to English** using Whisper\n\n`task`: `Translate to English`\n\n`language`: `Auto-Detect` or select the source language of your audio file\n\n### Audio **translation** using [**DeepL**](https://www.deepl.com/) translator\n\nTranslation to other languages than English is not supported by _Whisper_.\n\nHowever, as an alternative you can use [DeepL API](https://www.deepl.com/pro-api?cta=header-pro-api) to translate the transcription to [another language](https://support.deepl.com/hc/en-us/articles/360019925219-Languages-included-in-DeepL-Pro).\n\n`task`: `Transcribe`\n\n`language`: `Auto-Detect` or select the source language of your audio file \\*\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003ca\u003e\u003ci\u003eSupported source languages by DeepL\u003c/i\u003e\u003c/a\u003e\u003c/summary\u003e\n\n\u003ca href=\"https://www.deepl.com/docs-api/translate-text\"\u003e\u003ccode\u003esource_lang\u003c/code\u003e\u003c/a\u003e\n\n```\nBulgarian\nChinese\nCzech\nDanish\nDutch\nEnglish\nEstonian\nFinnish\nFrench\nGerman\nGreek\nHungarian\nIndonesian\nItalian\nJapanese\nKorean\nLatvian\nLithuanian\nNorwegian\nPolish\nPortuguese\nRomanian\nRussian\nSlovak\nSlovenian\nSpanish\nSwedish\nTurkish\nUkrainian\n```\n\n\u003c/details\u003e\n\n\\* If the source language of your audio file is supported by Whisper but not supported by DeepL you can use the `Translate to English` task to generate an English transcription first and translate that to your desired target language using DeepL.\n\n`deepl_api_key`: Your [DeepL API key](https://www.deepl.com/es/account/summary) generated after registering for a [DeepL Developer Account](https://www.deepl.com/pro-api).\n\n`deepl_target_language`: Select your desired language\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003ca\u003e\u003ci\u003eAvailable target languages by DeepL\u003c/i\u003e\u003c/a\u003e\u003c/summary\u003e\n  \n  \u003ca href=\"https://www.deepl.com/docs-api/translate-text\"\u003e\u003ccode\u003etarget_lang\u003c/code\u003e\u003c/a\u003e\n  \n  ```\n  Bulgarian\n  Chinese (simplified)\n  Czech\n  Danish\n  Dutch\n  English (American)\n  English (British)\n  Estonian\n  Finnish\n  French\n  German\n  Greek\n  Hungarian\n  Indonesian\n  Italian\n  Japanese\n  Korean\n  Latvian\n  Lithuanian\n  Norwegian\n  Polish\n  Portuguese (Brazilian)\n  Portuguese (European)\n  Romanian\n  Russian\n  Slovak\n  Slovenian\n  Spanish\n  Swedish\n  Turkish\n  Ukrainian\n  ```\n  \n\u003c/details\u003e\n\nThe [DeepL API](https://www.deepl.com/pro-api?cta=header-pro-api) has a free quota of **500,000 characters per month**.\n\nIf you exceed your free quota you can upgrade to _DeepL API Pro_ or try using the [Free Translator Files](https://www.deepl.com/translator/files) web feature uploading the generated transcripts.\n\nSee [**this example**](examples/multiple-files) with audio transcriptions in different languages using Whisper and translation to spanish using DeepL.\n\n### **Save transcripts** to different formats\n\n`output_formats`: Select the desired transcript formats (comma-separated)\n\nAvailable formats: **txt, vtt, srt, tsv, json**\n\n[`txt`](https://en.wikipedia.org/wiki/Text_file) is recommended to read a transcription.\n\n[`vtt`](https://en.wikipedia.org/wiki/WebVTT) or [`srt`](https://en.wikipedia.org/wiki/SubRip) are recommended to add **captions** to an audio or video.\n\nTranscript files will be located in the _**`audio_transcription`**_ folder.\n\n#### Add captions to VLC media player\n\nIf you use [VLC](https://www.videolan.org/) to play video or audio files, you can add your `vtt` or `srt` transcripts as captions by drag-and-drop the transcript file to the media player or go to _Subtitles -\u003e Add Subtitle File_.\n\nWith audio-only files you will need to enable a visualization in _Audio -\u003e Visualizations_.\n\n## Local installation\n\nIf you have a powerful computer with GPU hardware acceleration, you can run the [_notebook_](AudioToText.ipynb) or [_CLI_](#using-audiototext-cli) in your local machine.\n\nYou can also use them locally without a powerful GPU using [API](https://platform.openai.com/account/api-keys), as it always runs in the cloud.\n\nCPU execution is also available, but it is much slower and the [Colab](\u003c(https://colab.research.google.com/github/Carleslc/AudioToText/blob/master/AudioToText.ipynb)\u003e) version or API is recommended if you do not have a decent GPU.\nYou might, however, try to use the smaller models (`tiny`, `base`, `small`) on your CPU.\n\n### Using AudioToText CLI\n\nA plain [_python script_](audiototext.py) is available to use in your system without Jupyter.\n\n#### Install AudioToText CLI\n\n1. Clone this repository or download the [`audiototext.py`](https://raw.githubusercontent.com/Carleslc/AudioToText/master/audiototext.py) script (_right-click -\u003e Save as..._).\n2. Install [Python](https://www.python.org/downloads/) (3.8 - 3.10)\n3. Install [`ffmpeg`](https://ffmpeg.org/download.html)\n\n```sh\n# on MacOS using Homebrew (https://brew.sh/)\nbrew install ffmpeg\n\n# on Windows using Chocolatey (https://chocolatey.org/)\nchoco install ffmpeg\n\n# on Ubuntu or Debian\nsudo apt update \u0026\u0026 sudo apt install ffmpeg\n\n# on Arch Linux\nsudo pacman -S ffmpeg\n```\n\n#### AudioToText CLI usage\n\n```sh\n# Transcribe english.wav using large-v2 model to TXT, VTT, SRT, TSV and JSON formats\npython audiototext.py examples/english/english.wav --model large-v2 --output_dir audio_transcription\n\n# Translate french.wav from French to English using small model to TXT format\npython audiototext.py examples/french-to-english/french.wav --task translate --language French --output_format txt\n\n# Transcribe english_japanese.mp3 using API to TXT, VTT and SRT formats\npython audiototext.py examples/multi-language/english_japanese.mp3 --output_formats txt,vtt,srt --api_key sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n\n# Transcribe multiple files using Whisper large-v2 model and then translate the generated transcripts to Spanish using DeepL API to TXT, VTT and SRT formats\npython audiototext.py chinese.wav bruce.mp3 english_japanese.mp3 french.wav --model large-v2 --output_formats txt,vtt,srt --deepl_target_language Spanish --deepl_api_key xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:xx\n\n# See all available options\npython audiototext.py -h\n```\n\n```\npositional arguments:\n  audio_file            source file to transcribe\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --task {transcribe,translate}\n                        transcribe (default) or translate (to English)\n  --model {tiny,base,small,medium,large-v1,large-v2}\n                        model to use (default: small)\n  --language {Auto-Detect,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}\n                        source file language (default: Auto-Detect)\n  --prompt PROMPT       provide context about the audio or encourage a specific writing style, see https://platform.openai.com/docs/guides/speech-to-text/prompting\n  --coherence_preference {True,False}\n                        True (default): More coherence, but may repeat text. False: Less repetitions, but may have less coherence\n  --api_key API_KEY     if set with your OpenAI API Key (https://platform.openai.com/account/api-keys), the OpenAI API is used, which can improve the inference speed substantially, but it has an associated cost, see API pricing: https://openai.com/pricing#audio-models.\n                        API model is large-v2 (ignores --model)\n  --output_formats OUTPUT_FORMATS, --output_format OUTPUT_FORMATS\n                        desired result formats (default: txt,vtt,srt,tsv,json)\n  --output_dir OUTPUT_DIR\n                        folder to save results (default: audio_transcription)\n  --deepl_api_key DEEPL_API_KEY\n                        DeepL API key, if you want to translate results using DeepL. Get a DeepL Developer Account API Key: https://www.deepl.com/pro-api\n  --deepl_target_language {Bulgarian,Chinese,Chinese (simplified),Czech,Danish,Dutch,English,English (American),English (British),Estonian,Finnish,French,German,Greek,Hungarian,Indonesian,Italian,Japanese,Korean,Latvian,Lithuanian,Norwegian,Polish,Portuguese,Portuguese (Brazilian),Portuguese (European),Romanian,Russian,Slovak,Slovenian,Spanish,Swedish,Turkish,Ukrainian}\n                        results target language if you want to translate results using DeepL (--deepl_api_key required)\n  --deepl_coherence_preference {True,False}\n                        True (default): Share context between lines while translating. False: Translate each line independently\n  --deepl_formality {default,formal,informal}\n                        whether the translated text should lean towards formal or informal language (languages with formality supported: German,French,Italian,Spanish,Dutch,Polish,Portuguese,Russian)\n  --skip-install        skip pip dependencies installation\n```\n\n### Using Google Colab with your local environment\n\n[Google Colab](\u003c(https://colab.research.google.com/github/Carleslc/AudioToText/blob/master/AudioToText.ipynb)\u003e) lets you connect to a local runtime using [Jupyter](http://jupyter.org/install).\nThis allows you to use the notebook using your local hardware and have access to your local file system.\n\n[_How to set up and connect to a local runtime in Google Colab_](https://research.google.com/colaboratory/local-runtimes.html)\n\n### Using [Jupyter Notebook](https://github.com/jupyter/notebook)\n\nIf you do not want to rely on [Google Colab](#using-google-colab-with-your-local-environment) or use the [AudioToText CLI](#using-audiototext-cli), you can use the [Jupyter Notebook](https://docs.jupyter.org/) interface.\n\n[_How to install Jupyter Notebook_](https://docs.jupyter.org/en/latest/install/notebook-classic.html)\n\nClone or download this repository and run inside this repository folder:\n\n```sh\njupyter notebook AudioToText.ipynb\n```\n\nOr just run `jupyter notebook` without cloning this repository and _Upload_ the [`AudioToText.ipynb`](https://raw.githubusercontent.com/Carleslc/AudioToText/master/AudioToText.ipynb) file (_right-click -\u003e Save as..._).\n\n### Using [Jupyter Lab](https://github.com/jupyterlab/jupyterlab)\n\nAn alternative to the Jupyter Notebook interface is the [Jupyter Lab](https://jupyterlab.readthedocs.io/) interface.\n\n[_How to install Jupyter Lab_](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)\n\n```sh\njupyter lab\n```\n\nOpen the notebook using a URL:\n\n_File -\u003e Open from URL..._\n\n```\nhttps://raw.githubusercontent.com/Carleslc/AudioToText/master/AudioToText.ipynb\n```\n\n### Using [Whisper CLI](https://github.com/openai/whisper#command-line-usage)\n\nIf you do not need Cloud GPU and you do not want to translate using DeepL then you can just use the Whisper CLI in your console as follows:\n\n#### Install [Whisper CLI](https://github.com/openai/whisper#setup) locally\n\n1. Install [Python](https://www.python.org/downloads/) (3.8 - 3.10)\n2. Install [`ffmpeg`](https://ffmpeg.org/download.html)\n3. Install [Whisper CLI](https://github.com/openai/whisper#setup)\n\n```sh\npip install -U openai-whisper\n```\n\n#### [Whisper CLI usage](https://github.com/openai/whisper#command-line-usage)\n\n```sh\n# Transcribe english.wav using large-v2 model to TXT, VTT, SRT, TSV and JSON formats\nwhisper english.wav --model large-v2 --output_dir audio_transcription --output_format all\n\n# Translate french.wav from French to English using small model to TXT format\nwhisper french.wav --task translate --language French --output_dir audio_transcription --output_format txt\n\n# Transcribe multiple files using large-v2 model to TXT, VTT, SRT, TSV and JSON formats\nwhisper chinese.wav bruce.mp3 english_japanese.mp3 french.wav --model large-v2 --output_dir audio_transcription\n\n# See all available options\nwhisper --help\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCarleslc%2FAudioToText","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCarleslc%2FAudioToText","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCarleslc%2FAudioToText/lists"}