{"id":23265067,"url":"https://github.com/absadiki/subsai","last_synced_at":"2025-05-14T14:07:53.644Z","repository":{"id":121433024,"uuid":"607457910","full_name":"absadiki/subsai","owner":"absadiki","description":"🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants 🎞️","archived":false,"fork":false,"pushed_at":"2025-04-21T21:39:55.000Z","size":13684,"stargazers_count":1448,"open_issues_count":63,"forks_count":121,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-04-21T22:32:46.606Z","etag":null,"topics":["cli","subtitles","subtitles-generator","webui","whisper","whisper-ai"],"latest_commit_sha":null,"homepage":"https://absadiki.github.io/subsai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/absadiki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-02-28T02:16:39.000Z","updated_at":"2025-04-21T21:39:33.000Z","dependencies_parsed_at":"2023-09-22T04:40:22.614Z","dependency_job_id":"cfdf0ac5-25c4-4b21-8298-bc14d5b013e0","html_url":"https://github.com/absadiki/subsai","commit_stats":{"total_commits":145,"total_committers":14,"mean_commits":"10.357142857142858","dds":0.1586206896551724,"last_synced_commit":"a62c23e79011292c42ea3c13be89fb5d3375a5e2"},"previous_names":["absadiki/subsai","abdeladim-s/subsai"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/absadiki%2Fsubsai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/absadiki%2Fsubsai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/absadiki%2Fsubsai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/absadiki%2Fsubsai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/absadiki","download_url":"https://codeload.github.com/absadiki/subsai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254159194,"owners_count":22024558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","subtitles","subtitles-generator","webui","whisper","whisper-ai"],"created_at":"2024-12-19T15:03:44.019Z","updated_at":"2025-05-14T14:07:53.625Z","avatar_url":"https://github.com/absadiki.png","language":"Python","funding_links":[],"categories":["CLIs","Python"],"sub_categories":[],"readme":"# ️🎞️ Subs AI 🎞️\n Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants \n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/demo/demo.gif\"\u003e\n\u003c/p\u003e\n\n\u003c!-- TOC --\u003e\n* [Subs AI](#subs-ai)\n* [Features](#features)\n* [Installation](#installation)\n* [Usage](#usage)\n    * [Web-UI](#web-ui)\n    * [CLI](#cli)\n    * [From Python](#from-python)\n    * [Examples](#examples)\n* [Docker](#docker)\n* [Notes](#notes)\n* [Contributing](#contributing)\n* [License](#license)\n\u003c!-- TOC --\u003e\n\n# Features\n* Supported Models\n  * [x] [openai/whisper](https://github.com/openai/whisper)\n    * \u003e Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.\n  * [x] [linto-ai/whisper-timestamped](https://github.com/linto-ai/whisper-timestamped)\n    * \u003e Multilingual Automatic Speech Recognition with word-level timestamps and confidence\n  * [x] [ggerganov/whisper.cpp](https://github.com/ggerganov/whisper.cpp) (using [ absadiki/pywhispercpp](https://github.com/absadiki/pywhispercpp))\n    * \u003e High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model\n      \u003e * Plain C/C++ implementation without dependencies\n      \u003e * Runs on the CPU\n  * [x] [guillaumekln/faster-whisper](https://github.com/guillaumekln/faster-whisper)\n    * \u003e faster-whisper is a reimplementation of OpenAI's Whisper model using [CTranslate2](https://github.com/OpenNMT/CTranslate2/), which is a fast inference engine for Transformer models.\n      \u003e\n      \u003e This implementation is up to 4 times faster than [openai/whisper](https://github.com/openai/whisper) for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.\n  * [x] [m-bain/whisperX](https://github.com/m-bain/whisperX)\n    * \u003efast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.\n      \u003e  - ⚡️ Batched inference for 70x realtime transcription using whisper large-v2\n      \u003e  - 🪶 [faster-whisper](https://github.com/guillaumekln/faster-whisper) backend, requires \u003c8GB gpu memory for large-v2 with beam_size=5\n      \u003e  - 🎯 Accurate word-level timestamps using wav2vec2 alignment\n      \u003e  - 👯‍♂️ Multispeaker ASR using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio) (speaker ID labels) \n      \u003e  - 🗣️ VAD preprocessing, reduces hallucination \u0026 batching with no WER degradation.\n  * [x] [jianfch/stable-ts](https://github.com/jianfch/stable-ts)\n    * \u003e**Stabilizing Timestamps for Whisper**: This library modifies [Whisper](https://github.com/openai/whisper) to produce more reliable timestamps and extends its functionality.\n  * [x] [Hugging Face Transformers](https://huggingface.co/tasks/automatic-speech-recognition)\n    * \u003e Hugging Face implementation of Whisper.  Any speech recognition pretrained model from the Hugging Face hub can be used as well.\n  * [x] [API/openai/whisper](https://platform.openai.com/docs/guides/speech-to-text)\n    * \u003e OpenAI Whisper via their API\n\n* Web UI\n  * Fully offline, no third party services \n  * Works on Linux, Mac and Windows\n  * Lightweight and easy to use\n  * Supports subtitle modification\n  * Integrated tools:\n    * Translation using [xhluca/dl-translate](https://github.com/xhluca/dl-translate):\n      * Supported models:\n        * [x] [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) \n        * [x] [facebook/m2m100_418M](https://huggingface.co/facebook/m2m100_418M)\n        * [x] [facebook/m2m100_1.2B](https://huggingface.co/facebook/m2m100_1.2B)\n        * [x] [facebook/mbart-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt)\n    * Auto-sync using [smacke/ffsubsync](https://github.com/smacke/ffsubsync)\n    * Merge subtitles into the video\n* Command Line Interface\n  * For simple or batch processing\n* Python package\n  * In case you want to develop your own scripts\n* Supports different subtitle formats thanks to [tkarabela/pysubs2](https://github.com/tkarabela/pysubs2/)\n  * [x] SubRip\n  * [x] WebVTT\n  * [x] substation alpha\n  * [x] MicroDVD\n  * [x] MPL2\n  * [x] TMP\n* Supports audio and video files\n\n# Installation \n* Install [ffmpeg](https://ffmpeg.org/)\n\n_Quoted from the official openai/whisper installation_\n\u003e It requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:\n\u003e ```bash\n\u003e # on Ubuntu or Debian\n\u003e sudo apt update \u0026\u0026 sudo apt install ffmpeg\n\u003e\n\u003e # on Arch Linux\n\u003esudo pacman -S ffmpeg\n\u003e\n\u003e # on MacOS using Homebrew (https://brew.sh/)\n\u003e brew install ffmpeg\n\u003e\n\u003e # on Windows using Chocolatey (https://chocolatey.org/)\n\u003e choco install ffmpeg\n\u003e\n\u003e # on Windows using Scoop (https://scoop.sh/)\n\u003e scoop install ffmpeg\n\u003e```\n\u003eYou may need [`rust`](http://rust-lang.org) installed as well, in case [tokenizers](https://pypi.org/project/tokenizers/) does not provide a pre-built wheel for your platform. If you see installation errors during the `pip install` command above, please follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install Rust development environment. Additionally, you may need to configure the `PATH` environment variable, e.g. `export PATH=\"$HOME/.cargo/bin:$PATH\"`. If the installation fails with `No module named 'setuptools_rust'`, you need to install `setuptools_rust`, e.g. by running:\n\u003e```bash\n\u003epip install setuptools-rust\n\u003e``` \n\n* Once ffmpeg is installed, install `subsai`\n\n```shell\npip install git+https://github.com/absadiki/subsai\n```\n\u003e [!NOTE]\n\u003e * It is recommended to use Python 3.10 or 3.11. Versions 3.12 or later may have compatibility issues.\n\u003e * If torch is unable to detect your GPU devices during your usage of subsai, assuming you have a supported GPU device, there is a chance that `pip` installed the CPU version of torch. You can install a torch version with CUDA support by following the [get started locally guide](https://pytorch.org/get-started/locally/) on pytorch.\n\u003e For more information, see https://github.com/absadiki/subsai/issues/162.\n\n# Usage\n### Web-UI\n\nTo use the web-UI, run the following command on the terminal\n```shell\nsubsai-webui\n```\nAnd a web page will open on your default browser, otherwise navigate to the links provided by the command\n\nYou can also run the Web-UI using [Docker](#docker).\n\n### CLI\n\n```shell\nusage: subsai [-h] [--version] [-m MODEL] [-mc MODEL_CONFIGS] [-f FORMAT] [-df DESTINATION_FOLDER] [-tm TRANSLATION_MODEL]\n              [-tc TRANSLATION_CONFIGS] [-tsl TRANSLATION_SOURCE_LANG] [-ttl TRANSLATION_TARGET_LANG]\n              media_file [media_file ...]\n\npositional arguments:\n  media_file            The path of the media file, a list of files, or a text file containing paths for batch processing.\n\noptions:\n  -h, --help            show this help message and exit\n  --version             show program's version number and exit\n  -m MODEL, --model MODEL\n                        The transcription AI models. Available models: ['openai/whisper', 'linto-ai/whisper-timestamped']\n  -mc MODEL_CONFIGS, --model-configs MODEL_CONFIGS\n                        JSON configuration (path to a json file or a direct string)\n  -f FORMAT, --format FORMAT, --subtitles-format FORMAT\n                        Output subtitles format, available formats ['.srt', '.ass', '.ssa', '.sub', '.json', '.txt', '.vtt']\n  -df DESTINATION_FOLDER, --destination-folder DESTINATION_FOLDER\n                        The directory where the subtitles will be stored, default to the same folder where the media file(s) is stored.\n  -tm TRANSLATION_MODEL, --translation-model TRANSLATION_MODEL\n                        Translate subtitles using AI models, available models: ['facebook/m2m100_418M', 'facebook/m2m100_1.2B',\n                        'facebook/mbart-large-50-many-to-many-mmt']\n  -tc TRANSLATION_CONFIGS, --translation-configs TRANSLATION_CONFIGS\n                        JSON configuration (path to a json file or a direct string)\n  -tsl TRANSLATION_SOURCE_LANG, --translation-source-lang TRANSLATION_SOURCE_LANG\n                        Source language of the subtitles\n  -ttl TRANSLATION_TARGET_LANG, --translation-target-lang TRANSLATION_TARGET_LANG\n                        Target language of the subtitles\n\n\n```\n\nExample of a simple usage\n```shell\nsubsai ./assets/test1.mp4 --model openai/whisper --model-configs '{\"model_type\": \"small\"}' --format srt\n```\n\u003e Note: **For Windows CMD**, You will need to use the following :\n\u003e `subsai ./assets/test1.mp4 --model openai/whisper --model-configs \"{\\\"model_type\\\": \\\"small\\\"}\" --format srt`\n\nYou can also provide a simple text file for batch processing \n_(Every line should contain the absolute path to a single media file)_\n\n```shell\nsubsai media.txt --model openai/whisper --format srt\n```\n\n### From Python\n\n```python\nfrom subsai import SubsAI\n\nfile = './assets/test1.mp4'\nsubs_ai = SubsAI()\nmodel = subs_ai.create_model('openai/whisper', {'model_type': 'base'})\nsubs = subs_ai.transcribe(file, model)\nsubs.save('test1.srt')\n```\nFor more advanced usage, read [the documentation](https://absadiki.github.io/subsai/).\n\n### Examples \nSimple examples can be found in the [examples](https://github.com/absadiki/subsai/tree/main/examples) folder\n\n* [VAD example](https://github.com/absadiki/subsai/blob/main/examples/subsai_vad.ipynb): process long audio files using [silero-vad](https://github.com/snakers4/silero-vad). \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/absadiki/subsai/blob/main/examples/subsai_vad.ipynb\"\u003e\n  \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\n\u003c/a\u003e\n\n* [Translation example](https://github.com/absadiki/subsai/blob/main/examples/subsai_translation.ipynb): translate an already existing subtitles file. \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/absadiki/subsai/blob/main/examples/subsai_translation.ipynb\"\u003e\n  \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\n\u003c/a\u003e\n\n# Docker\n* Make sure that you have `docker` installed.\n* Prebuilt image\n  1. ```docker pull absadiki/subsai:main```\n  2. ```docker run --gpus=all -p 8501:8501 -v /path/to/your/media_files/folder:/media_files absadiki/subsai:main```\n* Build the image locally \n  1. Clone and `cd` to the repository\n  2. ```docker compose build```\n  3. ```docker compose run -p 8501:8501 -v /path/to/your/media_files/folder:/media_files subsai-webui # subsai-webui-cpu for cpu only```\n\n* You can access your media files through the mounted `media_files` folder.\n\n# Notes\n* If you have an NVIDIA graphics card, you may need to install [cuda](https://docs.nvidia.com/cuda/#installation-guides) to use the GPU capabilities.\n* AMD GPUs compatible with Pytorch should be working as well. [#67](https://github.com/absadiki/subsai/issues/67) \n* Transcription time is shown on the terminal, keep an eye on it while running the web UI. \n* If you didn't like Dark mode web UI, you can switch to Light mode from `settings \u003e Theme \u003e Light`.\n\n# Contributing\nIf you find a bug, have a suggestion or feedback, please open an issue for discussion.\n\n# License\n\nThis project is licensed under the GNU General Licence version 3 or later. You can modify or redistribute it under the conditions\nof these licences (See [LICENSE](./LICENSE) for more information).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabsadiki%2Fsubsai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabsadiki%2Fsubsai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabsadiki%2Fsubsai/lists"}