{"id":13585938,"url":"https://github.com/BingLingGroup/autosub","last_synced_at":"2025-04-07T10:32:11.863Z","repository":{"id":37885843,"uuid":"169956516","full_name":"BingLingGroup/autosub","owner":"BingLingGroup","description":"Command-line utility to transcribe/translate from video/audio/subtitles to subtitles ","archived":false,"fork":true,"pushed_at":"2023-12-21T09:24:18.000Z","size":1353,"stargazers_count":1986,"open_issues_count":39,"forks_count":246,"subscribers_count":34,"default_branch":"dev","last_synced_at":"2025-01-25T20:34:53.450Z","etag":null,"topics":["audio-segmentation","baidu-api","cloud-speech-api","substation-alpha","subtitles","voice-activity-detection","xfyun","xunfei-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"iWangJiaxiang/autosub","license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BingLingGroup.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-10T08:13:05.000Z","updated_at":"2025-01-24T18:33:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/BingLingGroup/autosub","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BingLingGroup%2Fautosub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BingLingGroup%2Fautosub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BingLingGroup%2Fautosub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BingLingGroup%2Fautosub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BingLingGroup","download_url":"https://codeload.github.com/BingLingGroup/autosub/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247637025,"owners_count":20971046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-segmentation","baidu-api","cloud-speech-api","substation-alpha","subtitles","voice-activity-detection","xfyun","xunfei-api"],"created_at":"2024-08-01T15:05:13.848Z","updated_at":"2025-04-07T10:32:11.577Z","avatar_url":"https://github.com/BingLingGroup.png","language":"Python","funding_links":[],"categories":["HarmonyOS","Python","Video Editing \u0026 Processing Tools"],"sub_categories":["Windows Manager","Subtitle \u0026 Caption Tools"],"readme":"# Autosub\n\n\u003cescape\u003e\u003ca href=\"https://travis-ci.org/BingLingGroup/autosub\"\u003e\u003cimg src=\"https://travis-ci.org/BingLingGroup/autosub.svg?branch=alpha\"\u003e\u003c/img\u003e\u003c/a\u003e \u003ca href=\"https://www.gnu.org/licenses/gpl-2.0\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-GPL--2.0-orange.svg\"\u003e\u003c/img\u003e\u003c/a\u003e\u003c/escape\u003e\n\n[简体中文](docs/README.zh-Hans.md)\n\nThis repo is not the same as [the original autosub repo](https://github.com/agermanidis/autosub).\n\nThis repo has been modified by several people. See the [Changelog](CHANGELOG.md).\n\n\u003cescape\u003e\u003cimg src=\"docs/icon/autosub.png\" width=\"128px\"\u003e\u003c/escape\u003e\n\n[autosub icon](docs/icon/autosub.svg) designed by BingLingGroup.\n\nSoftware: [inkscape](https://inkscape.org/zh/)\n\nFont: [source-han-sans](https://github.com/adobe-fonts/source-han-sans) ([SIL](https://github.com/adobe-fonts/source-han-sans/blob/master/LICENSE.txt))\n\nColor: [Solarized](https://en.wikipedia.org/wiki/Solarized_(color_scheme)#Colors)\n\n### TOC\n\n1. [Description](#description)\n2. [License](#license)\n3. [Dependencies](#dependencies)\n   - 3.1 [Optional Dependencies](#optional-dependencies)\n   - 3.2 [Required Dependencies](#required-dependencies)\n4. [Download and Installation](#download-and-installation)\n   - 4.1 [Branches](#branches)\n   - 4.2 [Install on Ubuntu](#install-on-ubuntu)\n   - 4.3 [Install on Windows](#install-on-windows)\n5. [Workflow](#workflow)\n   - 5.1 [Input](#input)\n   - 5.2 [Split](#split)\n   - 5.3 [Speech-to-Text/Translation API request](#speech-to-texttranslation-api-request)\n   - 5.4 [Speech-to-Text/Translation language support](#speech-to-texttranslation-language-support)\n   - 5.5 [Output](#Output)\n6. [Usage](#usage)\n   - 6.1 [Typical usage](#typical-usage)\n     - 6.1.1 [Pre-process Audio](#pre-process-audio)\n     - 6.1.2 [Detect Regions](#detect-regions)\n     - 6.1.3 [Split Audio](#split-audio)\n     - 6.1.4 [Transcribe Audio To Subtitles](#transcribe-audio-to-subtitles)\n       - 6.1.4.1 [Google Speech V2](#google-speech-v2)\n       - 6.1.4.2 [Google Cloud Speech-to-Text](#google-cloud-speech-to-text)\n       - 6.1.4.3 [Google speech config](#google-speech-config)\n       - 6.1.4.4 [Output API full response](#output-api-full-response)\n       - 6.1.4.5 [Xfyun speech config](#xfyun-speech-config)\n       - 6.1.4.6 [Baidu speech config](#baidu-speech-config)\n     - 6.1.5 [Translate Subtitles](#translate-subtitles)\n   - 6.2 [Options](#Options)\n   - 6.3 [Internationalization](#internationalization)\n7. [FAQ](#FAQ)\n   - 7.1 [Other APIs supports](#other-apis-supports)\n   - 7.2 [Batch processing](#batch-processing)\n   - 7.3 [proxy support](#proxy-support)\n   - 7.4 [macOS locale issue](#macos-locale-issue)\n   - 7.5 [Accuracy](#accuracy)\n8. [Bugs report](#bugs-report)\n9. [Build](#build)\n\nClick up arrow to go back to TOC.\n\n### Description\n\nAutosub is an automatic subtitles generating utility. It can detect speech regions automatically by using Auditok, split the audio files according to regions by using ffmpeg, transcribe speech based on several APIs and translate the subtitles' text by using py-googletrans.\n\nThe new features mentioned above are only available in the latest alpha branch. Not available on PyPI or the original repo.\n\n### License\n\nThis repo has a different license from [the original repo](https://github.com/agermanidis/autosub).\n\n[GPLv2](LICENSE)\n\n### Dependencies\n\nAutosub depends on these third party softwares or Python site-packages. Much appreciation to all of these projects.\n\n#### Optional dependencies\n\n- [ffmpeg](https://ffmpeg.org/)\n- [ffprobe](https://ffmpeg.org/ffprobe.html)\n- [langcodes](https://github.com/LuminosoInsight/langcodes)\n- [ffmpeg-normalize](https://github.com/slhck/ffmpeg-normalize)\n- [python-Levenshtein](https://github.com/ztane/python-Levenshtein)(Used by [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy))\n\nFor windows user:\n\n- [Build Tools for Visual Studio 2019](https://visualstudio.microsoft.com/downloads/)\n  - Used by [marisa-trie](https://github.com/pytries/marisa-trie) when installing.\n  - [marisa-trie](https://github.com/pytries/marisa-trie) is the dependency of the [langcodes](https://github.com/LuminosoInsight/langcodes).\n  - Probable components installation: MSVC VS C++ build tools, windows 10 SDK.\n\n#### Required dependencies\n\n- [auditok](https://github.com/amsehili/auditok)\n- [pysubs2](https://github.com/tkarabela/pysubs2)\n- [wcwidth](https://github.com/jquast/wcwidth)\n- [requests](https://github.com/psf/requests)\n- [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy)\n- [progressbar2](https://github.com/WoLpH/python-progressbar)\n- [websocket-client](https://github.com/websocket-client/websocket-client)\n- [py-googletrans](https://github.com/ssut/py-googletrans)\n\n[requirements.txt](requirements.txt).\n\nAbout how to install these dependencies, see [Download and Installation](#download-and-installation).\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n### Download and Installation\n\nExcept the PyPI version, others include non-original codes not from the original repository.\n\n0.4.0 \u003e autosub\n\n- These versions are only compatible with Python 2.7.\n\n0.5.6a \u003e= autosub \u003e= 0.4.0\n\n- These versions are compatible with both Python 2.7 and Python 3. It doesn't matter if you change the Python version in the installation commands below.\n\nautosub \u003e= 0.5.7a\n\n- These versions are only compatible with Python 3.\n\nffmpeg, ffprobe, ffmpeg-normalize need to be put on one of these places to let the autosub detect and use them. The following codes are in the [constants.py](autosub/constants.py). Priority is determined in order.\n\n1. Set the following environment variables before running the program: `FFMPEG_PATH`, `FFPROBE_PATH` and `FFMPEG_NORMALIZE_PATH`. It will override the ones located at the environment variable `PATH`. This will be helpful if you don't want to use the one in the `PATH`.\n2. Add them to the environment variable `PATH`. No need to worry about if using package manager to install such as using pip to install ffmpeg-normalize and using chocolatey to install ffmpeg.\n3. Add them to the same directory as the autosub executable.\n4. Add them to the current command line working directory.\n\nAbout the git installation. If you don't want to install git to use pip [VCS](https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support) support to install python package or just confused with git environment variables, you can manually click that clone and download button to download the source code and use pip to install the source code [locally](https://pip.pypa.io/en/stable/reference/pip_install/#description) by input these commands.\n\n```batch\ncd the_directory_contains_the_source_code\npip install .\n```\n\nDue to the autosub PyPI project is maintained by the original autosub repo's owner, I can't modify it or upload a project with the same name. Perhaps later when this version of autosub becomes stabler, I will rename and duplicate this repo and then upload it to PyPI.\n\n#### Branches\n\n[alpha branch](https://github.com/BingLingGroup/autosub/tree/alpha)\n\n- Include many changes from [the original repo](https://github.com/agermanidis/autosub). Details in [Changelog](CHANGELOG.md). Codes will be updated when an alpha version have been released. It is stabler than the dev branch\n\n[origin branch](https://github.com/BingLingGroup/autosub/tree/origin)\n\n- Include the least changes from [the original repo](https://github.com/agermanidis/autosub) except all new features in the [alpha branch](https://github.com/BingLingGroup/autosub/tree/alpha). The changes in [origin branch](https://github.com/BingLingGroup/autosub/tree/dev) just make sure there's no critical bugs when the program is running on Windows. Currently isn't maintained.\n\n[dev branch](https://github.com/BingLingGroup/autosub/tree/dev)\n\n- The latest codes will be pushed to this branch. If it works fine, it will be merged to alpha branch when new version is released.\n- Only used to test or pull request. Don't install them unless you know what you are doing.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Install on Ubuntu\n\nInclude dependencies installation commands.\n\nInstall from `alpha` branch.(latest autosub alpha release)\n\n```bash\napt install ffmpeg python3 python3-dev curl git -y\ncurl https://bootstrap.pypa.io/get-pip.py -o get-pip.py\npython3 get-pip.py\npip install git+https://github.com/BingLingGroup/autosub.git@alpha ffmpeg-normalize langcodes\n```\n\nInstall from `dev` branch.(latest autosub dev version)\n\n```bash\napt install ffmpeg python3 python3-dev curl git -y\ncurl https://bootstrap.pypa.io/get-pip.py -o get-pip.py\npython3 get-pip.py\npip install git+https://github.com/BingLingGroup/autosub.git@dev ffmpeg-normalize langcodes\n```\n\nInstall from `origin` branch.(autosub-0.4.0a)\n\n```bash\napt install ffmpeg python python-pip git -y\npip install git+https://github.com/BingLingGroup/autosub.git@origin\n```\n\nInstall from PyPI.(autosub-0.3.12)\n\n```bash\napt install ffmpeg python python-pip -y\npip install autosub\n```\n\nRecommend using `python3` and `python-pip3` instead of `python` and `python-pip` after autosub-0.4.0.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Install on Windows\n\nYou can just go to the [release page](https://github.com/BingLingGroup/autosub/releases) and download the latest release(standalone version) for Windows. Thre release version can run without the python environment. The click-and-run batches are also in the package. You can manually edit by using Notepad++. Or add the executable files' directory to system environment variables so you can use it as a universal command everywhere in the system if permission is Ok.\n\nTips: `Shift - Right Click` is the keyboard shortcut for opening a Powershell on current directory. To open an exe at current directory, the format is like `.\\autosub`.\n\nOr you can just directly open it and input the args manually though I don't recommend doing this due to its less efficiency.\n\n- The one without pyinstaller suffix is compiled by Nuitka. It's faster than the pyinstaller due to its compiling feature different from pyinstaller which just bundles the application.\n- ffmpeg and ffmpeg-normalize are also in the package. The original ffmpeg-normalize doesn't have a standalone version. The standalone version of ffmpeg-normalize is built separately. Codes are [here](https://github.com/BingLingGroup/ffmpeg-normalize).\n- If there's anything wrong on the both releases, or the package size and any other things are annoying you, you can just use the traditional pip installation method below.\n\nOr install Python environment(if you still don't have one) from choco and then install the package.\n\nRecommend using [chocolatey](https://chocolatey.org) on windows to install the environment and dependencies.\n\nChoco installation command is for cmd.(not Powershell)\n\n```batch\n@\"%SystemRoot%\\System32\\WindowsPowerShell\\v1.0\\powershell.exe\" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command \"iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))\" \u0026\u0026 SET \"PATH=%PATH%;%ALLUSERSPROFILE%\\chocolatey\\bin\"\n```\n\nIf you don't have [Build Tools for Visual Studio 2019](https://visualstudio.microsoft.com/downloads/), please install autosub without langcodes.\n\nInstall from `alpha` branch.(latest autosub alpha release)\n\n```batch\nchoco install git python curl ffmpeg -y\ncurl https://bootstrap.pypa.io/get-pip.py -o get-pip.py\npython get-pip.py\npip install git+https://github.com/BingLingGroup/autosub.git@alpha ffmpeg-normalize langcodes\n```\n\nInstall from `dev` branch.(latest autosub dev version)\n\n```batch\nchoco install git python curl ffmpeg -y\ncurl https://bootstrap.pypa.io/get-pip.py -o get-pip.py\npython get-pip.py\npip install git+https://github.com/BingLingGroup/autosub.git@dev ffmpeg-normalize langcodes\n```\n\nInstall from `origin` branch.(autosub-0.4.0a)\n\n```batch\nchoco install git python2 curl ffmpeg -y\ncurl https://bootstrap.pypa.io/get-pip.py -o get-pip.py\npython get-pip.py\npip install git+https://github.com/BingLingGroup/autosub.git@origin\n```\n\nPyPI version(autosub-0.3.12) is not recommended using on windows because it just can't run successfully. See the [changelog on the origin branch](CHANGELOG.md#040-alpha---2019-02-17) and you will know the details.\n\nRecommend using `python` instead of `python2` for autosub-0.4.0.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n### Workflow\n\n#### Input\n\nA video/audio/subtitles file.\n\nIf it is a video or audio file, use ffmpeg to convert the format into [the proper one](https://github.com/gillesdemey/google-speech-v2#data) for API. Any format supported by ffmpeg is OK to input, but the output or processed format for API is limited by API and autosub codes.\n\nSupported formats below:\n\n[Google-Speech-v2](https://github.com/gillesdemey/google-speech-v2)\n\n- 24bit/44100Hz/mono FLAC(default)\n- Other format like OGG_OPUS isn't supported by API. (I've tried modifying requests headers or json requests and it just don't work) Or format like PCM has less bits per sample but more storage usage than FLAC. Although the API support it but I think it's unnecessary to modify codes to support it.\n\n[Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text/docs/encoding) [v1p1beta1](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding)\n\n- Supported\n  - 24bit/44100Hz/mono FLAC(default)\n- Supported but not default args (more info on [Transcribe Audio To Subtitles](#transcribe-audio-to-subtitles))\n  - 8000Hz|12000Hz|16000Hz|24000Hz|48000Hz/mono OGG_OPUS\n  - MP3\n  - 16bit/mono PCM\n\n[Xfyun Speech-to-Text WebSocket API](https://www.xfyun.cn/doc/asr/voicedictation/API.html#%E6%8E%A5%E5%8F%A3%E8%A6%81%E6%B1%82)/[Baidu ASR API/Baidu ASR Pro API](https://ai.baidu.com/ai-doc/SPEECH/Vk38lxily)\n\n- Supported\n  - 16bit/16000Hz/mono PCM\n\nAlso, you can use the built-in audio pre-processing function though Google [doesn't recommend](https://cloud.google.com/speech-to-text/docs/best-practices) doing this. Honestly speaking, if your audio volume is not been standardized like too loud or too quiet, it's recommended to use some tools or just the built-in function to standardize it. The default [pre-processing commands](https://github.com/agermanidis/autosub/issues/40#issuecomment-509928060) depend on the ffmpeg-normalize and ffmpeg. The commands include three commands. The [first](https://trac.ffmpeg.org/wiki/AudioChannelManipulation) is for converting stereo to mono. The [second](https://superuser.com/questions/733061/reduce-background-noise-and-optimize-the-speech-from-an-audio-clip-using-ffmpeg) is for filtering out the sound not in the frequency of speech. The third is to normalize the audio to make sure it is not too loud or too quiet. If you are not satisfied with the default commands, you can also modified them yourself by input `-apc` option. Still, it currently only supports 24bit/44100Hz/mono FLAC format.\n\nIf it is a subtitles file and you give the proper arguments, only translate it by py-googletrans.\n\n#### Split\n\nAudio length limits:\n\n[Google-Speech-v2](https://github.com/gillesdemey/google-speech-v2)\n\n- No longer than [10 to 15 seconds](https://github.com/gillesdemey/google-speech-v2#caveats).\n- In autosub it is set as the [60-seconds-limit](https://github.com/BingLingGroup/autosub/blob/dev/autosub/constants.py#L74).\n\n[Google Cloud Speech-to-Text API](https://cloud.google.com/speech-to-text/docs/encoding)\n\n- No longer than [1 minute](https://cloud.google.com/speech-to-text/docs/sync-recognize).\n- In autosub it is currently set the same as the [60-seconds-limit](https://github.com/BingLingGroup/autosub/blob/dev/autosub/constants.py#L74).\n- Currently only support sync-recognize means only short-term audio supported.\n\n[Xfyun Speech-to-Text WebSocket API](https://www.xfyun.cn/doc/asr/voicedictation/API.html#%E6%8E%A5%E5%8F%A3%E8%A6%81%E6%B1%82)/[Baidu ASR API/Baidu ASR Pro API](https://ai.baidu.com/ai-doc/SPEECH/Vk38lxily)\n\n- Same limit above.\n\nAutosub uses Auditok to detect speech regions. And then use them to split as well as convert the video/audio into many audio fragments. Each fragment per region per API request. All these audio fragments are converted directly from input to avoid any extra quality loss.\n\nOr uses external regions from the file that pysubs2 supports like `.ass` or `.srt`. This will allow you to manually adjust the regions to get better recognition result.\n\n#### Speech-to-Text/Translation API request\n\nMakes parallel requests to generate transcriptions for those regions. One audio fragment per request. Recognition speed mostly depends on your network upload speed.\n\n- Manually post-processing for the subtitles lines may be needed, some of which are too long to be fitted in a single line at the bottom of the video frame.\n\nAfter Speech-to-Text, translates them to a different language. Combining multiple lines of text to a chunk of text to request for result. Details at [issue #49](https://github.com/BingLingGroup/autosub/issues/49). And finally saves the result subtitles to the local storage.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Speech-to-Text/Translation language support\n\nBelow is only for Google API language codes description. About other API: [Xfyun speech config](#xfyun-speech-config), [baidu speech config](#baidu-speech-config).\n\nThe Speech-to-Text lang codes are different from the Translation lang codes due to the difference between these two APIs. And of course, they are in *Google* formats, not following the iso standards, making users more confused to use.\n\nTo solve this problem, autosub uses [langcodes](https://github.com/LuminosoInsight/langcodes) to detect input lang code and convert it to a best match according to the lang code lists. Default it won't be enabled. To enable it in different phases, use `-bm all` option.\n\nTo manually match or see the full list of the lang codes, run the utility with the argument `-lsc`/`--list-speech-codes` and `-ltc`/ `--list-translation-codes`. Or open [constants.py](autosub/constants.py) and check.\n\nTo get the language of the first line of the subtitles file, you can use `-dsl` to detect.\n\n- Currently, autosub allows to send the lang codes not from the `--list-speech-codes`, which means in this case the program won't stop.\n\n- Though you can input the speech lang code whatever you want, need to point out that if not using the codes on the list but somehow the API accept it, [Google-Speech-v2](https://github.com/gillesdemey/google-speech-v2) recognizes your audio in the ways that depend on your IP address which is uncontrollable by yourself. This is a known issue and I ask for a [pull request](https://github.com/agermanidis/autosub/pull/136) in the original repo.\n\n- On the other hand, [py-googletrans](https://github.com/ssut/py-googletrans) is stricter. When it receive a lang code not on its list, it will throw an exception and stop translation.\n\n- Apart from the user input, another notable change is I split the `-S` option into two parts, `-S` and `-SRC`. `-S` option is for speech recognition lang code. `-SRC` is for translation source language. When not offering the arg of `-SRC`, autosub will automatically match the `-S` arg by using [langcodes](https://github.com/LuminosoInsight/langcodes) and get a best-match lang code for translation source language though [py-googletrans](https://github.com/ssut/py-googletrans) can auto-detect source language. Of course you can manually specify one by input `-SRC` option. `-D` is for translation destination language, still the same as before.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Output\n\nCurrently support the following formats to output.\n\n```Python\nOUTPUT_FORMAT = {\n    'srt': 'SubRip',\n    'ass': 'Advanced SubStation Alpha',\n    'ssa': 'SubStation Alpha',\n    'sub': 'MicroDVD Subtitle',\n    'mpl2.txt': 'Similar to MicroDVD',\n    'tmp': 'TMP Player Subtitle Format',\n    'vtt': 'WebVTT',\n    'json': 'json(Only times and text)',\n    'ass.json': 'json(Complex ass content json)',\n    'txt': 'Plain Text(Text or times)'\n}\n```\n\nOr other subtitles types/output modes, depend on what you need. More info in help message.\n\n```Python\nDEFAULT_MODE_SET = {\n    'regions',\n    'src',\n    'full-src',\n    'dst',\n    'bilingual',\n    'dst-lf-src',\n    'src-lf-dst'\n}\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n### Usage\n\nFor the original autosub usage, see [简体中文使用指南](https://binglinggroup.github.io/archives/autosub安装使用指南(windows及ubuntu).html).\n\nFor the modified alpha branch version, see the typical usage below.\n\n#### Typical usage\n\n\u003cescape\u003e\u003cdiv title=\"Typical usage\" align=\"middle\"\u003e\u003cimg src=\"docs/workflow.png\"\u003e\u003c/div\u003e\u003c/escape\u003e\n\n##### Pre-process Audio\n\nUse default [Audio pre-processing](https://github.com/agermanidis/autosub/issues/40).\n\nPre-processing only.\n\n```\nautosub -i input_file -ap o\n```\n\nPre-processing as a part.\n\n```\nautosub -i input_file -ap y ...(other options)\n```\n\n##### Detect Regions\n\nDetect regions by using Auditok.\n\nGetting regions only.\n\n```\nautosub -i input_file\n```\n\nGetting regions as a part.\n\n```\nautosub -i input_file -of regions ...(other options)\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n##### Split Audio\n\nGet audio fragments according to the regions.\n\nOnly get audio fragments according to auto regions detection.\n\n```\nautosub -i input_file -ap s\n```\n\nOnly get audio fragments according to external regions.\n\n```\nautosub -i input_file -ap s -er external_regions_subtitles\n```\n\nGetting audio fragments as a part.\n\n```\nautosub -i input_file -k ...(other options)\n```\n\n##### Transcribe Audio To Subtitles\n\nSpeech audio fragments to speech language subtitles.\n\n###### Google Speech V2\n\nUse default [Google-Speech-v2](https://github.com/gillesdemey/google-speech-v2) to transcribe speech language subtitles only.\n\n```\nautosub -i input_file -S lang_code\n```\n\nUse default [Google-Speech-v2](https://github.com/gillesdemey/google-speech-v2) to transcribe speech language subtitles as a part.\n\n```\nautosub -i input_file -S lang_code -of src ...(other options)\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n###### Google Cloud Speech-to-Text\n\nUse Google Cloud Speech-to-Text API service account(GOOGLE_APPLICATION_CREDENTIALS has already been set in system environment variable) to transcribe.\n\n```\nautosub -i input_file -sapi gcsv1 -S lang_code ...(other options)\n```\n\nUse Google Cloud Speech-to-Text API service account(GOOGLE_APPLICATION_CREDENTIALS is set by `-sa`) to transcribe.(Currently not available in Nuitka build.)\n\n```\nautosub -i input_file -sapi gcsv1 -S lang_code -sa path_to_key_file ...(other options)\n```\n\nUse Google Cloud Speech-to-Text API key to transcribe.\n\n```\nautosub -i input_file -sapi gcsv1 -S lang_code -skey API_key ...(other options)\n```\n\nUse 48000Hz OGG_OPUS in Google Cloud Speech-to-Text API. The conversion commands will be automatically modified by these [codes](https://github.com/BingLingGroup/autosub/blob/alpha/autosub/__init__.py#L135-L140).\n\n```\nautosub -i input_file -sapi gcsv1 -asf .ogg -asr 48000 ...(other options)\n```\n\nUse MP3 in Google Cloud Speech-to-Text API.(Not recommended because OGG_OPUS is better than MP3)\n```\nautosub -i input_file -sapi gcsv1 -asf .mp3 ...(other options)\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n###### Google Speech config\n\nUse customized [speech config file](https://googleapis.dev/python/speech/latest/gapic/v1/types.html#google.cloud.speech_v1.types.RecognitionConfig) to send request to Google Cloud Speech API. If using the config file, override these options: `-S`, `-asr`, `-asf`.\n\n`language_code` will be replaced by the best matching one if using option `-bm src` or `-bm all`. `encoding` string will be replaced by the enum in `google.cloud.speech_v1p1beta1.enums.RecognitionConfig.AudioEncoding` if using service account credentials. Default `encoding` is `FLAC`. Default `sample_rate_hertz` is `44100`.\n\nExample speech config file:\n\n```json\n{\n    \"language_code\": \"zh\",\n    \"enable_word_time_offsets\": true\n}\n```\n\nIf not provide option `-asr` and `-asf`, equal to:\n\n```json\n{\n    \"language_code\": \"zh\",\n    \"sample_rate_hertz\": 44100,\n    \"encoding\": \"FLAC\",\n    \"enable_word_time_offsets\": true\n}\n```\n\notherwise:\n\n```json\n{\n    \"language_code\": \"zh\",\n    \"sample_rate_hertz\": \"from --api-sample-rate\",\n    \"encoding\": \"from --api-suffix\",\n    \"enable_word_time_offsets\": true\n}\n```\n\ncommand:\n\n```\nautosub -i input_file -sconf config_json_file -bm all -sapi gcsv1 -skey API_key ...(other options)\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n###### Output API full response\n\nCurrently autosub can't handle many [advanced fields](https://cloud.google.com/speech-to-text/docs/reference/rpc/google.cloud.speech.v1p1beta1#google.cloud.speech.v1p1beta1.SpeechRecognitionResult) contained in the speech recognition result received from API, especially from Google Cloud Speech-to-Text API. With complex [speech config](#speech-config) input and option `-of full-src`, recognition results will be output into json file so you can customize them and handle them outside autosub.\n\nExample json output:\n\n```json\n[\n    {\n        \"start\": 0.52,\n        \"end\": 1.31,\n        \"content\": {\n            \"results\": [\n                {\n                    \"alternatives\": [\n                        {\n                            \"confidence\": 0.98267895,\n                            \"transcript\": \"how old is the Brooklyn Bridge\"\n                        }\n                    ]\n                }\n            ]\n        }\n    }\n]\n```\n\n\"start\" and \"end\" mean the start seconds and the end seconds in the whole audio file. \"content\" is the result received from API.\n\ncommand:\n\n```\nautosub -i input_file -sconf config_json_file -bm all -sapi gcsv1 -skey API_key -of full-src ...(other options)\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n##### Xfyun speech config\n\nFor Xfyun Speech-to-Text WebSocket API usage, user must input its speech config.\n\nExample speech config file:\n\n```json\n{\n    \"app_id\": \"\",\n    \"api_secret\": \"\",\n    \"api_key\": \"\",\n    \"api_address\": \"\",\n    \"business\": {\n        \"language\": \"zh_cn\",\n        \"domain\": \"iat\",\n        \"accent\": \"mandarin\"\n    }\n}\n```\n\n`\"business\"`/`\"api_address\"` field is the same as the [xfyun document](https://www.xfyun.cn/doc/asr/voicedictation/API.html#%E4%B8%9A%E5%8A%A1%E5%8F%82%E6%95%B0) mentioned.\n\nWhen the file doesn't include the `\"business\"` field, autosub will use the above default content instead.\n\nIf you add `\"delete_chars\": \"，。\"` in the configuration file (In this example, full-width comma and period are the punctuations to be deleted), autosub will automatically replace the specific punctuation with a space when receiving the transcript, and strip the space at the end of each sentence.\n\ncommand:\n\n```\nautosub -sapi xfyun -i input_file -sconf xfyun_speech_config ...(other options)\n```\n\n##### Baidu speech config\n\nFor Baidu ASR API usage, user must input its speech config.\n\nExample speech config file:\n\n```json\n{\n    \"AppID\": \"\",\n    \"API key\": \"\",\n    \"Secret Key\": \"\",\n    \"config\": {\n        \"format\": \"pcm\",\n        \"rate\": 16000,\n        \"channel\": 1,\n        \"cuid\": \"python\",\n        \"dev_pid\": 1537\n    }\n}\n```\n\n`\"config\"` field is the same as the [Baidu ASR document](https://ai.baidu.com/ai-doc/SPEECH/ek38lxj1u) mentioned.\n\nIf you want to use the Pro ASR API, change the value of `\"cuid\"` into `80001`.\n\nWhen the file doesn't include the `\"config\"` field, autosub will use the above default content instead.\n\nSame `\"delete_chars\"` function above.\n\nPractical speaking, since Baidu ASR/ASR Pro API doesn't allow concurrency by default, concurrency will be limited to 1. If you need to lift the limit, please add `\"disable_qps_limit\": true,` to the config file. If so, the concurrency will be set by the option `-sc`.\n\ncommand:\n\n```\nautosub -sapi baidu -i input_file -sconf baidu_speech_config ...(other options)\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n##### Translate Subtitles\n\nTranslate subtitles to another language.\n\nIf not input option `-SRC`, the translation source language will be auto-detected by py-googletrans.\n\nTranslate subtitles from an audio/video file.\n\n```\nautosub -i input_file -S lang_code (-SRC lang_code) -D lang_code\n```\n\nTranslate subtitles from a subtitles file.\n\n```\nautosub -i input_file (-SRC lang_code) -D lang_code\n```\n\nTranslate subtitles by \"translate.google.cn\" which can be directly accessed from somewhere.\n\n```\nautosub -i input_file -surl \"translate.google.cn\" ...(other options)\n```\n\nOr use other available URLs like these [ssut/py-googletrans#165](https://github.com/ssut/py-googletrans/issues/165).\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Options\n\nFull list of the help message.\n\n```\n$ autosub -h\nusage:\n  autosub [-i path] [options]\n\nAuto-generate subtitles for video/audio/subtitles file.\n\nInput Options:\n  Options to control input.\n\n  -i path, --input path\n                        The path to the video/audio/subtitles file that needs\n                        to generate subtitles. When it is a subtitles file,\n                        the program will only translate it. (arg_num = 1)\n  -er path, --ext-regions path\n                        Path to the subtitles file which provides external\n                        speech regions, which is one of the formats that\n                        pysubs2 supports and overrides the default method to\n                        find speech regions. (arg_num = 1)\n  -sty [path], --styles [path]\n                        Valid when your output format is \"ass\"/\"ssa\". Path to\n                        the subtitles file which provides \"ass\"/\"ssa\" styles\n                        for your output. If the arg_num is 0, it will use the\n                        styles from the : \"-esr\"/\"--external-speech-regions\".\n                        More info on \"-sn\"/\"--styles-name\". (arg_num = 0 or 1)\n  -sn [style_name [style_name ...]], --styles-name [style_name [style_name ...]]\n                        Valid when your output format is \"ass\"/\"ssa\" and\n                        \"-sty\"/\"--styles\" is given. Adds \"ass\"/\"ssa\" styles to\n                        your events. If not provided, events will use the\n                        first one from the file. If the arg_num is 1, events\n                        will use the specific style from the arg of \"-sty\"/\"--\n                        styles\". If the arg_num is 2, src language events use\n                        the first. Dst language events use the second.\n                        (arg_num = 1 or 2)\n\nLanguage Options:\n  Options to control language.\n\n  -S lang_code, --speech-language lang_code\n                        Lang code/Lang tag for speech-to-text. Recommend using\n                        the Google Cloud Speech reference lang codes. WRONG\n                        INPUT WON'T STOP RUNNING. But use it at your own risk.\n                        Ref: https://cloud.google.com/speech-to-\n                        text/docs/languages(arg_num = 1) (default: None)\n  -SRC lang_code, --src-language lang_code\n                        Lang code/Lang tag for translation source language. If\n                        not given, use py-googletrans to auto-detect the src\n                        language. (arg_num = 1) (default: auto)\n  -D lang_code, --dst-language lang_code\n                        Lang code/Lang tag for translation destination\n                        language. (arg_num = 1) (default: None)\n  -bm [mode [mode ...]], --best-match [mode [mode ...]]\n                        Use langcodes to get a best matching lang code when\n                        your input is wrong. Only functional for py-\n                        googletrans and Google Speech API. If langcodes not\n                        installed, use fuzzywuzzy instead. Available modes: s,\n                        src, d, all. \"s\" for \"-S\"/\"--speech-language\". \"src\"\n                        for \"-SRC\"/\"--src-language\". \"d\" for \"-D\"/\"--dst-\n                        language\". (3 \u003e= arg_num \u003e= 1)\n  -mns integer, --min-score integer\n                        An integer between 0 and 100 to control the good match\n                        group of \"-lsc\"/\"--list-speech-codes\" or \"-ltc\"/\"--\n                        list-translation-codes\" or the match result in \"-bm\"/\"\n                        --best-match\". Result will be a group of \"good match\"\n                        whose score is above this arg. (arg_num = 1)\n\nOutput Options:\n  Options to control output.\n\n  -o path, --output path\n                        The output path for subtitles file. (default: the\n                        \"input\" path combined with the proper name tails)\n                        (arg_num = 1)\n  -F format, --format format\n                        Destination subtitles format. If not provided, use the\n                        extension in the \"-o\"/\"--output\" arg. If \"-o\"/\"--\n                        output\" arg doesn't provide the extension name, use\n                        \"srt\" instead. In this case, if \"-i\"/\"--input\" arg is\n                        a subtitles file, use the same extension from the\n                        subtitles file. (arg_num = 1) (default: srt)\n  -y, --yes             Prevent pauses and allow files to be overwritten. Stop\n                        the program when your args are wrong. (arg_num = 0)\n  -of [type [type ...]], --output-files [type [type ...]]\n                        Output more files. Available types: regions, src,\n                        full-src, dst, bilingual, dst-lf-src, src-lf-dst, all.\n                        \"regions\", \"src\", \"full-src\" are available only if\n                        input is not a subtitles file. full-src: Full result\n                        received from Speech-to-Text API in json format with\n                        start and end time. dst-lf-src: dst language and src\n                        language in the same event. And dst is ahead of src.\n                        src-lf-dst: src language and dst language in the same\n                        event. And src is ahead of dst. (6 \u003e= arg_num \u003e= 1)\n                        (default: ['dst'])\n  -fps float, --sub-fps float\n                        Valid when your output format is \"sub\". If input, it\n                        will override the fps check on the input file. Ref:\n                        https://pysubs2.readthedocs.io/en/latest/api-\n                        reference.html#supported-input-output-formats (arg_num\n                        = 1)\n\nSpeech Options:\n  Options to control speech-to-text. If Speech Options not given, it will only generate the times.\n\n  -sapi API_code, --speech-api API_code\n                        Choose which Speech-to-Text API to use. Currently\n                        support: gsv2: Google Speech V2\n                        (https://github.com/gillesdemey/google-speech-v2).\n                        gcsv1: Google Cloud Speech-to-Text V1P1Beta1\n                        (https://cloud.google.com/speech-to-text/docs). xfyun:\n                        Xun Fei Yun Speech-to-Text WebSocket API (https://www.\n                        xfyun.cn/doc/asr/voicedictation/API.html). baidu:\n                        Baidu Automatic Speech Recognition API\n                        (https://ai.baidu.com/ai-doc/SPEECH/Vk38lxily)\n                        (arg_num = 1) (default: gsv2)\n  -skey key, --speech-key key\n                        The API key for Google Speech-to-Text API. (arg_num =\n                        1) Currently support: gsv2: The API key for gsv2.\n                        (default: Free API key) gcsv1: The API key for gcsv1.\n                        (If used, override the credentials given by\"-sa\"/\"--\n                        service-account\")\n  -sconf [path], --speech-config [path]\n                        Use Speech-to-Text recognition config file to send\n                        request. Override these options below: \"-S\", \"-asr\",\n                        \"-asf\". Currently support: gcsv1: Google Cloud Speech-\n                        to-Text V1P1Beta1 API key config reference:\n                        https://cloud.google.com/speech-to-\n                        text/docs/reference/rest/v1p1beta1/RecognitionConfig\n                        Service account config reference: https://googleapis.d\n                        ev/python/speech/latest/gapic/v1/types.html#google.clo\n                        ud.speech_v1.types.RecognitionConfig xfyun: Xun Fei\n                        Yun Speech-to-Text WebSocket API\n                        (https://console.xfyun.cn/services/iat). baidu: Baidu\n                        Automatic Speech Recognition API\n                        (https://ai.baidu.com/ai-doc/SPEECH/ek38lxj1u). If\n                        arg_num is 0, use const path. (arg_num = 0 or 1)\n                        (const: config.json)\n  -mnc float, --min-confidence float\n                        Google Speech-to-Text API response for text\n                        confidence. A float value between 0 and 1. Confidence\n                        bigger means the result is better. Input this argument\n                        will drop any result below it. Ref:\n                        https://github.com/BingLingGroup/google-\n                        speech-v2#response (arg_num = 1) (default: 0.0)\n  -der, --drop-empty-regions\n                        Drop any regions without speech recognition result.\n                        (arg_num = 0)\n  -sc integer, --speech-concurrency integer\n                        Number of concurrent Speech-to-Text requests to make.\n                        (arg_num = 1) (default: 4)\n\npy-googletrans Options:\n  Options to control translation. Default method to translate. Could be blocked at any time.\n\n  -slp second, --sleep-seconds second\n                        (Experimental)Seconds to sleep between two translation\n                        requests. (arg_num = 1) (default: 1)\n  -surl [URL [URL ...]], --service-urls [URL [URL ...]]\n                        (Experimental)Customize request urls. Ref: https://py-\n                        googletrans.readthedocs.io/en/latest/ (arg_num \u003e= 1)\n  -ua User-Agent headers, --user-agent User-Agent headers\n                        (Experimental)Customize User-Agent headers. Same docs\n                        above. (arg_num = 1)\n  -doc, --drop-override-codes\n                        Drop any .ass override codes in the text before\n                        translation. Only affect the translation result.\n                        (arg_num = 0)\n  -gt-dc [chars], --gt-delete-chars [chars]\n                        Replace the specific chars with a space after\n                        translation, and strip the space at the end of each\n                        sentence. Only affect the translation result. (arg_num\n                        = 0 or 1) (const: ，。！)\n\nSubtitles Conversion Options:\n  Options to control subtitles conversions.(Experimental)\n\n  -mjs integer, --max-join-size integer\n                        (Experimental)Max length to join two events. (arg_num\n                        = 1) (default: 100)\n  -mdt second, --max-delta-time second\n                        (Experimental)Max delta time to join two events.\n                        (arg_num = 1) (default: 0.2)\n  -dms string, --delimiters string\n                        (Experimental)Delimiters not to join two events.\n                        (arg_num = 1) (default: !()*,.:;?[]^_`~)\n  -sw1 words_delimited_by_space, --stop-words-1 words_delimited_by_space\n                        (Experimental)First set of Stop words to split two\n                        events. (arg_num = 1)\n  -sw2 words_delimited_by_space, --stop-words-2 words_delimited_by_space\n                        (Experimental)Second set of Stop words to split two\n                        events. (arg_num = 1)\n  -ds, --dont-split     (Experimental)Don't Split just merge. (arg_num = 0)\n\nNetwork Options:\n  Options to control network.\n\n  -hsa, --http-speech-api\n                        Change the Google Speech V2 API URL into the http one.\n                        (arg_num = 0)\n  -hsp [URL], --https-proxy [URL]\n                        Add https proxy by setting environment variables. If\n                        arg_num is 0, use const proxy url. (arg_num = 0 or 1)\n                        (const: https://127.0.0.1:1080)\n  -hp [URL], --http-proxy [URL]\n                        Add http proxy by setting environment variables. If\n                        arg_num is 0, use const proxy url. (arg_num = 0 or 1)\n                        (const: http://127.0.0.1:1080)\n  -pu username, --proxy-username username\n                        Set proxy username. (arg_num = 1)\n  -pp password, --proxy-password password\n                        Set proxy password. (arg_num = 1)\n\nOther Options:\n  Other options to control.\n\n  -h, --help            Show autosub help message and exit. (arg_num = 0)\n  -V, --version         Show autosub version and exit. (arg_num = 0)\n  -sa path, --service-account path\n                        Set service account key environment variable. It\n                        should be the file path of the JSON file that contains\n                        your service account credentials. Can be overridden by\n                        the API key. Ref:\n                        https://cloud.google.com/docs/authentication/getting-\n                        started Currently support: gcsv1\n                        (GOOGLE_APPLICATION_CREDENTIALS) (arg_num = 1)\n\nAudio Processing Options:\n  Options to control audio processing.\n\n  -ap [mode [mode ...]], --audio-process [mode [mode ...]]\n                        Option to control audio process. If not given the\n                        option, do normal conversion work. \"y\": pre-process\n                        the input first then start normal workflow. If\n                        succeed, no more conversion before the speech-to-text\n                        procedure. \"o\": only pre-process the input audio.\n                        (\"-k\"/\"--keep\" is true) \"s\": only split the input\n                        audio. (\"-k\"/\"--keep\" is true) Default command to pre-\n                        process the audio: C:\\Program\n                        Files\\ImageMagick-7.0.10-Q16\\ffmpeg.exe -hide_banner\n                        -i \"{in_}\" -vn -af \"asplit[a],aphasemeter=video=0,amet\n                        adata=select:key=lavfi.aphasemeter.phase:value=-0.005:\n                        function=less,pan=1c|c0=c0,aresample=async=1:first_pts\n                        =0,[a]amix\" -ac 1 -f flac -loglevel error \"{out_}\" |\n                        C:\\Program Files\\ImageMagick-7.0.10-Q16\\ffmpeg.exe\n                        -hide_banner -i \"{in_}\" -af\n                        \"lowpass=3000,highpass=200\" -loglevel error \"{out_}\" |\n                        C:\\Python37\\Scripts\\ffmpeg-normalize.exe -v \"{in_}\"\n                        -ar 44100 -ofmt flac -c:a flac -pr -p -o \"{out_}\"\n                        (Ref: https://github.com/stevenj/autosub/blob/master/s\n                        cripts/subgen.sh https://ffmpeg.org/ffmpeg-\n                        filters.html) (2 \u003e= arg_num \u003e= 1)\n  -k, --keep            Keep audio processing files to the output path.\n                        (arg_num = 0)\n  -apc [command [command ...]], --audio-process-cmd [command [command ...]]\n                        This arg will override the default audio pre-process\n                        command. Every line of the commands need to be in\n                        quotes. Input file name is {in_}. Output file name is\n                        {out_}. (arg_num \u003e= 1)\n  -ac integer, --audio-concurrency integer\n                        Number of concurrent ffmpeg audio split process to\n                        make. (arg_num = 1) (default: 4)\n  -acc command, --audio-conversion-cmd command\n                        (Experimental)This arg will override the default audio\n                        conversion command. \"[\", \"]\" are optional arguments\n                        meaning you can remove them. \"{\", \"}\" are required\n                        arguments meaning you can't remove them. (arg_num = 1)\n                        (default: C:\\Program\n                        Files\\ImageMagick-7.0.10-Q16\\ffmpeg.exe -hide_banner\n                        -y -i \"{in_}\" -vn -ac {channel} -ar {sample_rate}\n                        -loglevel error \"{out_}\")\n  -asc command, --audio-split-cmd command\n                        (Experimental)This arg will override the default audio\n                        split command. Same attention above. (arg_num = 1)\n                        (default: C:\\Program\n                        Files\\ImageMagick-7.0.10-Q16\\ffmpeg.exe -y -ss {start}\n                        -i \"{in_}\" -t {dura} -vn -ac [channel] -ar\n                        [sample_rate] -loglevel error \"{out_}\")\n  -asf file_suffix, --api-suffix file_suffix\n                        (Experimental)This arg will override the default API\n                        audio suffix. (arg_num = 1) (default: .flac)\n  -asr sample_rate, --api-sample-rate sample_rate\n                        (Experimental)This arg will override the default API\n                        audio sample rate(Hz). (arg_num = 1) (default: 44100)\n  -aac channel_num, --api-audio-channel channel_num\n                        (Experimental)This arg will override the default API\n                        audio channel. (arg_num = 1) (default: 1)\n\nAuditok Options:\n  Options to control Auditok when not using external speech regions control.\n\n  -et energy, --energy-threshold energy\n                        The energy level which determines the region to be\n                        detected. Ref: https://auditok.readthedocs.io/en/lates\n                        t/apitutorial.html#examples-using-real-audio-data\n                        (arg_num = 1) (default: 45)\n  -mnrs second, --min-region-size second\n                        Minimum region size. Same docs above. (arg_num = 1)\n                        (default: 0.5)\n  -mxrs second, --max-region-size second\n                        Maximum region size. Same docs above. (arg_num = 1)\n                        (default: 10.0)\n  -mxcs second, --max-continuous-silence second\n                        Maximum length of a tolerated silence within a valid\n                        audio activity. Same docs above. (arg_num = 1)\n                        (default: 0.2)\n  -nsml, --not-strict-min-length\n                        If not input this option, it will keep all regions\n                        strictly follow the minimum region limit. Ref: https:/\n                        /auditok.readthedocs.io/en/latest/core.html#class-\n                        summary (arg_num = 0)\n  -dts, --drop-trailing-silence\n                        Ref: https://auditok.readthedocs.io/en/latest/core.htm\n                        l#class-summary (arg_num = 0)\n\nList Options:\n  List all available arguments.\n\n  -lf, --list-formats   List all available subtitles formats. If your format\n                        is not supported, you can use ffmpeg or SubtitleEdit\n                        to convert the formats. You need to offer fps option\n                        when input is an audio file and output is \"sub\"\n                        format. (arg_num = 0)\n  -lsc [lang_code], --list-speech-codes [lang_code]\n                        List all recommended \"-S\"/\"--speech-language\" Google\n                        Speech-to-Text language codes. If no arg is given,\n                        list all. Or else will list a group of \"good match\" of\n                        the arg. Default \"good match\" standard is whose match\n                        score above 90 (score between 0 and 100). Ref:\n                        https://tools.ietf.org/html/bcp47 https://github.com/L\n                        uminosoInsight/langcodes/blob/master/langcodes/__init_\n                        _.py lang code example: language-script-region-\n                        variant-extension-privateuse (arg_num = 0 or 1)\n  -ltc [lang_code], --list-translation-codes [lang_code]\n                        List all available \"-SRC\"/\"--src-language\" py-\n                        googletrans translation language codes. Or else will\n                        list a group of \"good match\" of the arg. Same docs\n                        above. (arg_num = 0 or 1)\n  -dsl path, --detect-sub-language path\n                        Use py-googletrans to detect a sub file's first line\n                        language. And list a group of matched language in\n                        recommended \"-S\"/\"--speech-language\" Google Speech-to-\n                        Text language codes. Ref:\n                        https://cloud.google.com/speech-to-text/docs/languages\n                        (arg_num = 1) (default: None)\n\nMake sure the argument with space is in quotes.\nThe default value is used\nwhen the option is not given at the command line.\n\"(arg_num)\" means if the option is given,\nthe number of the arguments is required.\nArguments *ARE* the things given behind the options.\nAuthor: Bing Ling\nEmail: binglinggroup@outlook.com\nBug report: https://github.com/BingLingGroup/autosub\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Internationalization\n\nAutosub supports multi-language command line user interface by [GNU gettext](https://www.gnu.org/software/gettext/). Now supports `zh_CN` and default `en_US`. More info about this [lang codes format](https://www.gnu.org/software/gettext/manual/gettext.html#Locale-Names). The program will automatically detect the os locale and use the one supported. For windows 10, it seems adjusting the `Region`-`Regional format` is Ok.\n\nOf course, autosub offers a method to override the os locale. Just create a txt file without extension named `locale`, containing the lang codes at the beginning of the file, at the command line current working directory. When autosub starts, it will detect this file and read the lang code inside it and apply it if supported.\n\nIf you want to translate this program into other languages, first install the gettext utilities. Then you can run `python scripts/update_po_files.py lang_code` to create the locale files which you want to translate into. And then use [POEditor](https://poeditor.com/) to edit po files. [update_po_files.py](scripts/update_po_files.py) can also automatically merge the position info into the old po files and compile the po files into mo files which the program read them. So it's useful when the codes changed, you can merge the positional changes into the translations automatically.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n### FAQ\n\n#### Other APIs supports\n\n[issue #11](https://github.com/BingLingGroup/autosub/issues/11)\n\nI won't add any new features unless I'm less busy in the future. However, pull requests are welcomed.\n\n#### Batch processing\n\n[issue #13](https://github.com/BingLingGroup/autosub/issues/13)\n\nSame as above. Currently won't add it. You can use batch/powershell/bash to implement it.\n\nExample for batch:(working at current directory)\n\n```batch\n@echo off\nset \"in_format=*.mp4 *.m4a\"\n\n@echo on\nfor /f \"delims=^\" %%i in ('dir /b %in_format%') do (\n    autosub -i \"%%i\" ...(other options)\n)\n@echo off\n```\n\nIf you want do a recursive walk through the directories, replace `'dir /b %in_format%'` with `'dir /b/s/a:-d %in_format%'` is Ok. [Reference](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/dir).\n\n#### proxy support\n\n[issue #17](https://github.com/BingLingGroup/autosub/issues/17)\n\nCurrently I only implement the proxy settings in the same way as setting environment variables in the command line. So it is necessary for you to open a http/https proxy server locally like [shadowsocks-windows](https://github.com/shadowsocks/shadowsocks-windows/releases) or [shadowsocks](https://github.com/shadowsocks/shadowsocks/tree/master).\n\nIf you often encounter empty result or connection error during speech-to-text or subtitles translation, perhaps you need to get a better proxy for a better connection with Google or just rent a Linux server which can reach Google's network.\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### macOS locale issue\n\n[issue 83 (comment)](https://github.com/BingLingGroup/autosub/issues/83#issuecomment-586624157)\n\n```Python\nTraceback (most recent call last):\n  File \"/usr/local/bin/autosub\", line 5, in \u003cmodule\u003e\n    from autosub import main\n  File \"/usr/local/lib/python3.7/site-packages/autosub/__init__.py\", line 15, in \u003cmodule\u003e\n    from autosub import ffmpeg_utils\n  File \"/usr/local/lib/python3.7/site-packages/autosub/ffmpeg_utils.py\", line 25, in \u003cmodule\u003e\n    fallback=True)\n  File \"/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gettext.py\", line 518, in translation\n    mofiles = find(domain, localedir, languages, all=True)\n  File \"/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gettext.py\", line 490, in find\n    for nelang in _expand_lang(lang):\n  File \"/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gettext.py\", line 212, in _expand_lang\n    loc = locale.normalize(loc)\n  File \"/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/locale.py\", line 401, in normalize\n    code = localename.lower()\nAttributeError: 'NoneType' object has no attribute 'lower'\n```\n\nIt seems environment variable `LANG` and `LC_ALL` are not set on some macOS versions. Please manually set it before running the program. [ewdurbin/evacuate_2stp#1 (comment)](https://github.com/ewdurbin/evacuate_2stp/issues/1#issuecomment-413736644)\n\n[How to set environment variables on macOS](https://medium.com/@himanshuagarwal1395/setting-up-environment-variables-in-macos-sierra-f5978369b255).\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n\n#### Accuracy\n\nApart from the volume issue mentioned in the [input](#input) section above, you need to make sure the audio doesn't contain some kind of vocals otherwise you need to adjust the Auditok options or the speech recognition config.\n\n### Bugs report\n\nBugs and suggestions can be reported at [issues](https://github.com/BingLingGroup/autosub/issues).\n\n### Build\n\nI only write the scripts for building standalone executable files on windows, [Nuitka script](scripts/nuitka_build.bat) and [pyinstaller script](scripts/pyinstaller_build.bat).\n\nThe version 0.5.4a doesn't support Nuitka build since 0.5.4a import google.cloud package and it contains `pkg_resources.get_distribution` which is not supported by Nuitka due to this [Nuitka issue #146](https://github.com/Nuitka/Nuitka/issues/146).\n\nThe version 0.5.5a catches the exception `pkg_resources.DistributionNotFound` to remove the Google Cloud service account support when it is built by Nuitka.\n\nNuitka build is pretty tricky. These environments I tried and worked.\n\n1. Anaconda recommended by [Nuitka readme](https://github.com/Nuitka/Nuitka#id6).\n   - Python version 3.5\n   - mingw-w64 package [m2w64-gcc](https://anaconda.org/msys2/m2w64-gcc) (No need to set the environment variables separately if you run it on Anaconda Prompt)\n2. Use other C Compiler rather than the [m2w64-gcc](https://anaconda.org/msys2/m2w64-gcc) by setting the value of environment variable `CC` which is the path to the C Compiler, including the executable name. For example, you install [MingW-W64-builds](http://mingw-w64.org/doku.php/download/mingw-builds) somewhere on your storage and you want Nuitka to use it. In this case, Python 3.5 is still recommended.\n3. Nuitka 0.6.6 is the latest stable version I tried. Other later version like 0.6.7 doesn't support windows icon input.\n\nAnd for those whose os language is not `en_US`, please set it to `en_US` and then start to build. Otherwise you may encounter this [known issue](https://github.com/Nuitka/Nuitka/issues/193).\n\nAbout Pyinstaller build, you need to manually hook the gcloud module. [Source](https://stackoverflow.com/questions/40076795/pyinstaller-file-fails-to-execute-script-distributionnotfound).\n\n\u003e . So you need to create a hook file for that names\n\u003e\n\u003e Python_Path\\Lib\\site-packages\\PyInstaller\\hooks\\hook-gcloud.py\n\u003e\n\u003e File contents:\n\n```Python\nfrom PyInstaller.utils.hooks import copy_metadata\ndatas = copy_metadata('gcloud')\n```\n\n[create_release.py](scripts/create_release.py) is used to make the two release packages. You need to create a `binaries` folder containing ffmpeg and ffmpeg-normalize executable files if you want to create a \"fully\" standalone release like the one I release. `ffmpeg.exe` and `ffprobe.exe` are from [Zeranoe ffmpeg windows build](https://ffmpeg.zeranoe.com/builds/). `ffmpeg-normalize.exe` built in the same way as I mentioned [above](#install-on-windows).\n\nIt should have the directory structure like this below.\n\n```\nbinaries\\ffmpeg.exe\nbinaries\\ffprobe.exe\nbinaries\\ffmpeg-normalize-Nuitka\\ffmpeg-normalize.exe\nbinaries\\ffmpeg-normalize-pyinstaller\\ffmpeg-normalize.exe\n```\n\n\u003cescape\u003e\u003ca href = \"#TOC\"\u003e\u0026nbsp;↑\u0026nbsp;\u003c/a\u003e\u003c/escape\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBingLingGroup%2Fautosub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBingLingGroup%2Fautosub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBingLingGroup%2Fautosub/lists"}