{"id":35093818,"url":"https://github.com/mhered/transcribe","last_synced_at":"2026-05-18T21:08:34.083Z","repository":{"id":61668322,"uuid":"553846175","full_name":"mhered/transcribe","owner":"mhered","description":null,"archived":false,"fork":false,"pushed_at":"2022-10-28T15:51:32.000Z","size":2208,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-03-10T15:25:19.138Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhered.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-18T21:38:01.000Z","updated_at":"2022-10-18T21:40:08.000Z","dependencies_parsed_at":"2023-01-20T14:46:54.787Z","dependency_job_id":null,"html_url":"https://github.com/mhered/transcribe","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/mhered/transcribe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhered%2Ftranscribe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhered%2Ftranscribe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhered%2Ftranscribe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhered%2Ftranscribe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhered","download_url":"https://codeload.github.com/mhered/transcribe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhered%2Ftranscribe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28080199,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-27T02:00:05.897Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-27T15:03:10.346Z","updated_at":"2025-12-27T15:04:30.026Z","avatar_url":"https://github.com/mhered.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `transcribe`: crear subtítulos para los videos de un canal YouTube\n\n## Créditos\n\nInspirado en este [gist](https://gist.github.com/midudev/2bc13e6ef38ccc4716fba8b7258f1403) de Miguel Angel Durán y en este [video tutorial](https://www.youtube.com/watch?v=F30yC2jl5nA) de la herramienta `yt-whisper` de Miguel Piedrafita.\n\nDesarrollado para añadir subtítulos a los vídeos del canal youtube de Python España a sugerencia de @astrojuanlu durante #Hacktoberfest22.\n\n## Requisitos\n\n* Python 3.9\n\nSe usa `pyenv` para poder tener varias versiones de python en paralelo ([How to install and run multiple pythons with virtualenv and vscode](https://k0nze.dev/posts/install-pyenv-venv-vscode/)):\n\n```bash\n$ # prerequisites\n$ sudo apt install -y make build-essential libssl-dev zlib1g-dev \\\nlibbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \\\nlibncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python-openssl \\\ngit\n$ # pyenv\n$ git clone https://github.com/pyenv/pyenv.git ~/.pyenv\n$ # environment\n$ echo 'export PYENV_ROOT=\"$HOME/.pyenv\"' \u003e\u003e ~/.bashrc\n$ echo 'export PATH=\"$PYENV_ROOT/bin:$PATH\"' \u003e\u003e ~/.bashrc\n$ echo 'eval \"$(pyenv init --path)\"' \u003e\u003e ~/.bashrc\n$ # activar\n$ source ~/.bashrc\n```\n\nDescarga Python 3.9 con pyenv:\n\n```bash\n$ pyenv install 3.9.15\n```\n\nDescarga el repo:\n\n```bash\n$ git clone https://github.com/mhered/transcribe.git\n```\n\nInstala localmente Python 3.9 con pyenv, que gestionará también los paquetes que instales luego con pip:\n\n```bash\n$ cd transcribe\n$ pyenv local 3.9.15\n$ pyenv version\n3.9.15 (set by ~/transcribe/.python-version)\n$ # note pyenv \"channels\" pip so youll need to install packages for each python version\n$ which pip\n/home/mhered/.pyenv/shims/pip\n```\n\nCrea y activa el entorno virtual en el directorio:\n\n```bash\n$ python -m venv .venv\n$ source .venv/bin/activate\n(.venv)$\n```\n\n* Instala [PyTube](https://pytube.io/en/latest/):\n\n```bash\n$ pip install pytube\n```\n\n* Instala [Whisper](https://openai.com/blog/whisper/):\n\n```bash\n$ pip install git+https://github.com/openai/whisper.git\n```\n\n* Instala `ffmpeg`:\n\n```bash\n$ sudo apt update \u0026\u0026 sudo apt install ffmpeg\n```\n\n* Instala `yt-whisper`:\n\n```bash\n$ pip install git+https://github.com/m1guelpf/yt-whisper.git\n```\n\n## `scan_channel.py`: escanear un canal YouTube\n\nHerramienta de linea de comandos que recibe la URL de un canal de YouTube, lo escanea y genera un archivo JSON con detalles de todos los vídeos que contiene.\n\nUso de la herramienta:   \n\n```sh\n$ python3 scan_channel.py -h\nusage: scan_channel.py [-h] --channel CHANNEL [--file FILE]\n\nScan a YouTube channel and create a JSON file with contents\n\noptional arguments:\n  -h, --help         show this help message and exit\n  --file FILE        Name of the input file (Optional, default is\n                     'channel.json')\n\nrequired named arguments:\n  --channel CHANNEL  (Required) URL of the YouTube channel to scan\n\n$ python3 scan_channel.py --channel https://www.youtube.com/channel/UCPnRCRhb-6gaPZuQWS7RVag --file my_channel.json\n\n2022-10-25 21:18:41,325 [INFO] Scanning YouTube channel...\n2022-10-25 21:18:42,574 [INFO] 6 videos found in the channel.\n2022-10-25 21:18:42,575 [INFO] Building data structure...\n2022-10-25 21:18:44,062 [INFO] 16.67% ...\n2022-10-25 21:18:45,523 [INFO] 33.33% ...\n2022-10-25 21:18:46,813 [INFO] 50.00% ...\n2022-10-25 21:18:47,964 [INFO] 66.67% ...\n2022-10-25 21:18:49,363 [INFO] 83.33% ...\n2022-10-25 21:18:50,707 [INFO] 100.00% ...\n2022-10-25 21:18:50,709 [INFO] Writing JSON...\n```\n\nLa herramienta escanea el canal y genera un fichero JSON con detalles de los videos. El identificador único lo toma de la URL del video.\n\n```json\n{\n    \"z2Vivp0AbRg\": {\n        \"url\": \"https://www.youtube.com/watch?v=z2Vivp0AbRg\",\n        \"title\": \"Mini Pupper adventures - Part 6 - Legs Assembly\",\n        \"published\": \"01/05/22\",\n        \"views\": 239,\n        \"transcript\": \"\"\n    },\n    \"PA9QOXW9rWs\": {\n        \"url\": \"https://www.youtube.com/watch?v=PA9QOXW9rWs\",\n        \"title\": \"Mini Pupper adventures - Part 5 - Software\",\n        \"published\": \"14/04/22\",\n        \"views\": 242,\n        \"transcript\": \"\"\n    },\n    \"bmLP8sHBs2o\": {\n        \"url\": \"https://www.youtube.com/watch?v=bmLP8sHBs2o\",\n        \"title\": \"Mini Pupper adventures - Part 4 - Electronics\",\n        \"published\": \"24/03/22\",\n        \"views\": 241,\n        \"transcript\": \"\"\n    },\n    \"shPb4SnDpC4\": {\n        \"url\": \"https://www.youtube.com/watch?v=shPb4SnDpC4\",\n        \"title\": \"Mini Pupper adventures - Part 1 -  Unboxing\",\n        \"published\": \"03/03/22\",\n        \"views\": 315,\n        \"transcript\": \"\"\n    },\n    \"kHCIWT2SSXw\": {\n        \"url\": \"https://www.youtube.com/watch?v=kHCIWT2SSXw\",\n        \"title\": \"Mini Pupper adventures - Part 2 - Hip Assembly\",\n        \"published\": \"03/03/22\",\n        \"views\": 266,\n        \"transcript\": \"\"\n    },\n    \"e0-bLMICy54\": {\n        \"url\": \"https://www.youtube.com/watch?v=e0-bLMICy54\",\n        \"title\": \"Mini Pupper adventures - Part 3 - Body Assembly\",\n        \"published\": \"03/03/22\",\n        \"views\": 325,\n        \"transcript\": \"\"\n    }\n}\n```\n\n## `process_channel.py`: Generar subtítulos\n\nHerramienta de linea de comandos que lee un archivo JSON con el formato generado por `scan_channel.py` y genera subtítulos para cada uno de los vídeos.\n\nEl proceso de generar subtítulos es bastante lento (aproximadamente x3 la duración del vídeo usando un portátil modesto), por lo que la herramienta está pensada para reanudar la tarea si es interrumpida (el progreso parcial del video que se está analizando se pierde pero no graba los subtitulos generados y continúa por donde iba cuando se relanza).\n\nLa herramienta lee el JSON y busca el video más prioritario que tiene vacío el campo`transcript` (es decir, para el que aun no ha generado subtítulos). Cuando termina graba los subtítulos en formato SRT en el directorio `./captions/[video_id]/`, actualiza el campo `transcript` en el JSON y lo salva, y continua con el siguiente vídeo por orden de prioridad. La prioridad la define el parámetro `--priority` que puede ser`popular` (mayor número de visualizaciones, por defecto) o `recent` (fecha de publicación) . \n\nUso de la línea de comandos:   \n\n```bash\n$ python3 process_channel.py -h\nusage: process_channel.py [-h] [--file FILE] [--priority {popular,recent}]\n\nResumes processing videos from a JSON file\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --file FILE           Name of the input file (Optional, default is\n                        'channel.json')\n  --priority {popular,recent}\n                        Criteria to prioritize queue of videos pending\n                        processing (Optional, default is 'popular')\n\n$ python3 process_channel.py --file my_channel.json --priority popular\n```\n\n## Issues conocidos\n\n* Para interrumpir la ejecución hay que pulsar CTRL+C repetidas veces (#2)\n\n## To do\n\n* Explorar la API de youtube para subir subtitulos automáticamente:  https://github.com/youtube/api-samples/blob/master/python/captions.py y https://developers.google.com/youtube/v3/docs\n* Subir al repo automáticamente cada vez que termina de generar unos subtítulos, i.e. : \n\n```bash\n$ git add .\n$ git commit -m \"add 1 caption\"\n$ git push\n```\n\n​\t\tEsto sirve reducir conflictos si varias máquinas trabajan de forma concurrente.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhered%2Ftranscribe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhered%2Ftranscribe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhered%2Ftranscribe/lists"}