{"id":30604099,"url":"https://github.com/chrisdoc/podkeet","last_synced_at":"2025-10-04T10:48:29.176Z","repository":{"id":311714895,"uuid":"1044384708","full_name":"chrisdoc/podkeet","owner":"chrisdoc","description":"Download a YouTube video's audio as MP3 with yt-dlp and transcribe it using Parakeet-MLX on Apple Silicon.","archived":false,"fork":false,"pushed_at":"2025-08-26T05:37:17.000Z","size":105,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-26T06:26:48.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chrisdoc.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-25T15:48:03.000Z","updated_at":"2025-08-26T05:37:20.000Z","dependencies_parsed_at":"2025-08-26T06:26:57.069Z","dependency_job_id":"267c9ed7-53ee-4961-af3c-5ecd9ba3bf69","html_url":"https://github.com/chrisdoc/podkeet","commit_stats":null,"previous_names":["chrisdoc/podkeet"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/chrisdoc/podkeet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdoc%2Fpodkeet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdoc%2Fpodkeet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdoc%2Fpodkeet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdoc%2Fpodkeet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chrisdoc","download_url":"https://codeload.github.com/chrisdoc/podkeet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdoc%2Fpodkeet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278302558,"owners_count":25964520,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-30T01:12:01.833Z","updated_at":"2025-10-04T10:48:29.161Z","avatar_url":"https://github.com/chrisdoc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# podkeet\n\nDownload a YouTube video's audio as MP3 with `yt-dlp` and transcribe it using Parakeet-MLX (MLX on Apple Silicon).\n\n## Requirements\n- macOS on Apple Silicon (M1/M2/M3/M4)\n- Python `\u003e= 3.13`\n- `ffmpeg` (for `yt-dlp` post-processing)\n  - Install on macOS: `brew install ffmpeg`\n\nParakeet-MLX is installed as a dependency and will use MLX (Metal) on Apple Silicon when `device=auto`.\n\n## Quick start\nRun directly with uvx (no virtualenv needed):\n\n```fish\nuvx podkeet transcribe \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\" --out-dir ./outputs\n```\n\nNote: The first run will install the `podkeet` CLI automatically. If you prefer a persistent install:\n\n```fish\nuvx pip install -U podkeet\n```\n\nOr, if you prefer working in a virtual environment:\n\n```fish\nuv venv --python 3.13\nuv sync --extra dev\npodkeet transcribe \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\" --out-dir ./outputs\n```\n\nThis will:\n- Check for `ffmpeg` and instruct you to install it if missing.\n- Download the best audio stream and convert it to MP3.\n- Transcribe the MP3 with Parakeet-MLX, saving a transcript next to the audio (and in `--out-dir`).\n\n## Installation (PyPI)\nOnce released on PyPI, you can install directly:\n\n```fish\nuvx pip install -U podkeet\n```\n\n## CLI reference\n- `podkeet download URL --out-dir PATH [--no-timing]`\n- `podkeet transcribe URL_OR_FILE --out-dir PATH [--keep-audio] [--language auto|en|…] [--model NAME] [--format txt|srt|vtt|json] [--device auto|mps|cpu] [--no-timing] [--version]`\n\nNotes:\n- If `ffmpeg` is missing, a clear message explains how to install it.\n- The first transcription may download Parakeet-MLX models; subsequent runs use the local cache.\n- On Apple Silicon, `device=auto` prefers MLX (`mps`) and falls back to CPU if needed.\n- Timing: The CLI shows elapsed time for download and transcription; hide with `--no-timing`.\n- JSON: When `--format json` is used, the CLI prints a compact JSON summary to stdout (suitable for automation).\n\n## Robustness\n- Filenames with special characters: We detect the actual file written by `yt-dlp` instead of guessing by title, avoiding path mismatches.\n- Large files / memory: If a full-file transcription hits a Metal/MLX memory error, the tool automatically falls back to chunked transcription (~10-minute segments) and merges results with correct timestamps.\n- Network hiccups: The downloader uses retries, socket timeouts, and exponential backoff to handle transient network failures.\n\n## Examples\n```fish\n# Download only\npodkeet download \"https://www.youtube.com/watch?v=8P7v1lgl-1s\" --out-dir ./podcasts\n\n# Transcribe from URL with a specific start (yt-dlp handles t=)\npodkeet transcribe \"https://www.youtube.com/watch?v=8P7v1lgl-1s\u0026t=121s\" --out-dir ./podcasts\n\n# Transcribe a local file to SRT\npodkeet transcribe ./podcasts/example.mp3 --out-dir ./podcasts --format srt\n\n# JSON summary output (includes timings):\npodkeet transcribe \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\" --format json | jq\n```\n\n## Development\nInstall dev extras and set up the environment:\n\n```fish\nuv venv --python 3.13\nuv sync --extra dev\n```\n\nFormat with Ruff:\n```fish\nuvx ruff format\n```\n\nLint with Ruff:\n```fish\nuvx ruff check\nuvx ruff check --fix\n```\n\nType-check with Ty (pre-release):\n```fish\nuvx ty check\n```\n\nRun tests:\n```fish\nuv run pytest -q\n```\n\nBuild package (sdist + wheel):\n```fish\nuvx --from build pyproject-build\nls dist/\n```\n\n### CI/CD\n- CI (lint, type, tests, build) runs on pushes and PRs.\n- Releases are automated:\n  - Conventional Commits drive version bumps and `CHANGELOG.md` via Python Semantic Release.\n  - A tag `vX.Y.Z` is created on `main`.\n  - The Release workflow builds and publishes to PyPI using OIDC (Trusted Publishing).\n\nCommit message hints (Conventional Commits):\n- `feat: …` → minor version bump\n- `fix: …` → patch version bump\n- `feat!: …` or footer `BREAKING CHANGE:` → major version bump\n\n## Troubleshooting\n- `ffmpeg` not found: `brew install ffmpeg` (then re-run).\n- MLX out-of-memory: The tool will switch to chunked transcription automatically; if still failing, try a smaller model.\n- Network or YouTube rate limiting: The downloader retries with backoff; re-run later if persistent.\n\n## License\nMIT\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisdoc%2Fpodkeet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchrisdoc%2Fpodkeet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisdoc%2Fpodkeet/lists"}