{"id":20942346,"url":"https://github.com/unfoldingword-dev/submachine","last_synced_at":"2025-06-20T10:11:28.812Z","repository":{"id":246885708,"uuid":"823193981","full_name":"unfoldingWord-dev/submachine","owner":"unfoldingWord-dev","description":"Transcribes, translates and integrates subtitles into any video.","archived":false,"fork":false,"pushed_at":"2025-02-17T18:57:00.000Z","size":34,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-13T03:43:48.678Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unfoldingWord-dev.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-02T15:00:59.000Z","updated_at":"2025-02-17T18:57:03.000Z","dependencies_parsed_at":"2025-01-19T21:15:56.907Z","dependency_job_id":null,"html_url":"https://github.com/unfoldingWord-dev/submachine","commit_stats":null,"previous_names":["unfoldingword-dev/submachine"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/unfoldingWord-dev/submachine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord-dev%2Fsubmachine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord-dev%2Fsubmachine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord-dev%2Fsubmachine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord-dev%2Fsubmachine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unfoldingWord-dev","download_url":"https://codeload.github.com/unfoldingWord-dev/submachine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord-dev%2Fsubmachine/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260924535,"owners_count":23083524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-18T23:26:13.655Z","updated_at":"2025-06-20T10:11:23.799Z","avatar_url":"https://github.com/unfoldingWord-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Introduction\nThis code is an attempt to create a fully automatic subtitling system for vidoes.\n\nThe basic code is taken from [Digital Ocean](https://www.digitalocean.com/community/tutorials/how-to-generate-and-add-subtitles-to-videos-using-python-openai-whisper-and-ffmpeg)\nand molded into a class with my own nips and tucks. The translation was not part of the original tutorial. \n\nAudio transcription is done **locally** through the [OpenAI Whisper](https://openai.com/research/whisper) model.\nText translation is done **locally** through [ArgosTranslate](https://pypi.org/project/argostranslate/).\n\n## Usage\nI have only tested this with mp4 videos, so YYMV. Here are the steps. \n\n1) Clone this repo\n```bash\ngit clone https://github.com/unfoldingWord-dev/submachine.git\ncd submachine\n```\n2) Create a `.env` file based on `example.env`\n3) Create an output directory as defined in .env\n4) Setup a virtual environment (or make a mess of your local Python setup)\n```bash\npython3 -m venv venv\nsource venv/bin/activate\n```\n5) Install all the requirements\n```bash\npip install -r requirements.txt\n```\nThis will take a while, as the Whisper model is quite big.\n\n6) Run the code \n```bash\npython3 main.py\n```\nFor every language that you want a subtitle for, an additional package will be downloaded during runtime. This can take some time.\n\n## Known issues\nYou might encounter an error like `Could not load library libcudnn_ops_infer.so.8`. In that case, you need to install the CUDNN library. On Ubuntu, you need to run `sudo apt install libcudnn8` or `sudo apt install nvidia-cudnn`. (I have no idea about any other platform).\n*(Time to start wrapping the whole thing in a Docker container)*\n\n## On languages detected by Whisper\nThe following languages can be detected by Whisper \n(_as deducted from Whisper output_):\n\nAfrikaans (af), Albanian (sq), Amharic (am), Arabic (ar), Armenian (hy), \nAssamese (as), Azerbaijani (az), Bashkir (ba), Basque (eu), \nBelarusian (be), Bengali (bn), Bosnian (bs), Breton (br), Bulgarian (bg), \nBurmese (my), Catalan (ca), Chinese (zh), Croatian (hr), Czech (cs), \nDanish (da), Dutch (nl), English (en), Estonian (et), Faroese (fo), \nFilipino (tl), Finnish (fi), French (fr), Galician (gl), Georgian (ka), \nGerman (de), Greek (el), Gujarati (gu), Haitian Creole (ht), Hausa (ha), \nHawaiian (haw), Hebrew (he), Hindi (hi), Hungarian (hu), Icelandic (is), \nIndonesian (id), Italian (it), Japanese (ja), Javanese (jw), Kannada (kn), \nKazakh (kk), Khmer (km), Korean (ko), Lao (lo), Latin (la), Latvian (lv), \nLingala (ln), Lithuanian (lt), Luxembourgish (lb), Macedonian (mk), \nMalagasy (mg), Malayalam (ml), Malay (ms), Maltese (mt), Maori (mi), \nMarathi (mr), Mongolian (mn), Nepali (ne), Norwegian (no), \nNorwegian Nynorsk (nn), Occitan (oc), Pashto (ps), Persian (fa), \nPolish (pl), Portuguese (pt), Punjabi (pa), Romanian (ro), Russian (ru), \nSanskrit (sa), Serbian (sr), Shona (sn), Sindhi (sd), Sinhala (si), \nSlovak (sk), Slovenian (sl), Somali (so), Spanish (es), Sundanese (su), \nSwahili (sw), Swedish (sv), Tajik (tg), Tamil (ta), Tatar (tt), \nTelugu (te), Thai (th), Tibetan (bo), Turkish (tr), Turkmen (tk), \nUkrainian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi), Welsh (cy), \nYiddish (yi), Yoruba (yo)\n\n## On language packs in Argos\n\u003e Argos Translate [...] manages automatically pivoting through \n\u003e intermediate languages to translate between languages that don't have \n\u003e a direct translation between them installed. For example, if you have \n\u003e a **es → en** and **en → fr** translation installed you are able to translate \n\u003e from **es → fr** as if you had that translation installed. This allows \n\u003e for translating between a wide variety of languages at the cost of some \n\u003e loss of translation quality.\n\n*https://pypi.org/project/argostranslate/*\n\nThe following language packs (and thus direct translations) are currently available:\n\n- Albanian -\u003e English\n- Arabic -\u003e English\n- Azerbaijani -\u003e English\n- Bengali -\u003e English\n- Bulgarian -\u003e English\n- Catalan -\u003e English\n- Chinese (traditional) -\u003e English\n- Chinese -\u003e English\n- Czech -\u003e English\n- Danish -\u003e English\n- Dutch -\u003e English\n- English -\u003e Albanian\n- English -\u003e Arabic\n- English -\u003e Azerbaijani\n- English -\u003e Bengali\n- English -\u003e Bulgarian\n- English -\u003e Catalan\n- English -\u003e Chinese\n- English -\u003e Chinese (traditional)\n- English -\u003e Czech\n- English -\u003e Danish\n- English -\u003e Dutch\n- English -\u003e Esperanto\n- English -\u003e Estonian\n- English -\u003e Finnish\n- English -\u003e French\n- English -\u003e German\n- English -\u003e Greek\n- English -\u003e Hebrew\n- English -\u003e Hindi\n- English -\u003e Hungarian\n- English -\u003e Indonesian\n- English -\u003e Irish\n- English -\u003e Italian\n- English -\u003e Japanese\n- English -\u003e Korean\n- English -\u003e Latvian\n- English -\u003e Lithuanian\n- English -\u003e Malay\n- English -\u003e Norwegian\n- English -\u003e Persian\n- English -\u003e Polish\n- English -\u003e Portuguese\n- English -\u003e Romanian\n- English -\u003e Russian\n- English -\u003e Slovak\n- English -\u003e Slovenian\n- English -\u003e Spanish\n- English -\u003e Swedish\n- English -\u003e Tagalog\n- English -\u003e Thai\n- English -\u003e Turkish\n- English -\u003e Ukranian\n- English -\u003e Urdu\n- Esperanto -\u003e English\n- Estonian -\u003e English\n- Finnish -\u003e English\n- French -\u003e English\n- German -\u003e English\n- Greek -\u003e English\n- Hebrew -\u003e English\n- Hindi -\u003e English\n- Hungarian -\u003e English\n- Indonesian -\u003e English\n- Irish -\u003e English\n- Italian -\u003e English\n- Japanese -\u003e English\n- Korean -\u003e English\n- Latvian -\u003e English\n- Lithuanian -\u003e English\n- Malay -\u003e English\n- Norwegian -\u003e English\n- Persian -\u003e English\n- Polish -\u003e English\n- Portuguese -\u003e English\n- Portuguese -\u003e Spanish\n- Romanian -\u003e English\n- Russian -\u003e English\n- Slovak -\u003e English\n- Slovenian -\u003e English\n- Spanish -\u003e English\n- Spanish -\u003e Portuguese\n- Swedish -\u003e English\n- Tagalog -\u003e English\n- Thai -\u003e English\n- Turkish -\u003e English\n- Ukranian -\u003e English\n- Urdu -\u003e English\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funfoldingword-dev%2Fsubmachine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funfoldingword-dev%2Fsubmachine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funfoldingword-dev%2Fsubmachine/lists"}