{"id":33295191,"url":"https://github.com/biyachuev/yt-transcriber","last_synced_at":"2026-05-16T09:06:13.646Z","repository":{"id":319365055,"uuid":"1068363458","full_name":"biyachuev/yt-transcriber","owner":"biyachuev","description":"AI-powered audio/video processing: transcription, speaker diarization, LLM refinement, translation | AI-обработка аудио/видео: транскрибация, распознавание спикеров, перевод","archived":false,"fork":false,"pushed_at":"2025-11-17T18:30:27.000Z","size":643,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-17T20:27:15.898Z","etag":null,"topics":["ai-pipeline","audio-intelligence","audio-processing","batch-processing","document-generation","llm","media-processing","nllb","nlp","ollama","openai-api","speaker-diarization","speech-to-text","translation","video-transcription","voice-activity-detection","whisper","youtube-downloader"],"latest_commit_sha":null,"homepage":"https://github.com/biyachuev/yt-transcriber#readme","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/biyachuev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-02T09:15:37.000Z","updated_at":"2025-11-17T18:30:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/biyachuev/yt-transcriber","commit_stats":null,"previous_names":["biyachuev/yt-transcriber"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/biyachuev/yt-transcriber","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biyachuev%2Fyt-transcriber","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biyachuev%2Fyt-transcriber/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biyachuev%2Fyt-transcriber/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biyachuev%2Fyt-transcriber/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/biyachuev","download_url":"https://codeload.github.com/biyachuev/yt-transcriber/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/biyachuev%2Fyt-transcriber/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284988465,"owners_count":27095952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-18T02:00:05.759Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-pipeline","audio-intelligence","audio-processing","batch-processing","document-generation","llm","media-processing","nllb","nlp","ollama","openai-api","speaker-diarization","speech-to-text","translation","video-transcription","voice-activity-detection","whisper","youtube-downloader"],"created_at":"2025-11-18T02:01:33.371Z","updated_at":"2026-05-16T09:06:13.632Z","avatar_url":"https://github.com/biyachuev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# YouTube Transcriber \u0026 Translator\n\nA flexible toolkit for transcribing and translating YouTube videos, audio files, and existing documents.\n\n## 🎯 Highlights\n\n### Version 1.7 (current)\n- ✅ **GigaAM v3 backend (RU)**\n  - Optional models `gigaam-e2e-rnnt` / `gigaam-e2e-ctc` via Hugging Face (`transformers`, torch ≥ 2.6)\n  - Supports caching, VAD-driven chunking, and long-form processing\n  - New unit tests for loading and chunking (`tests/test_gigaam_backend.py`)\n\n### Version 1.6\n- ✅ **Early API Key Validation**\n  - Validates OpenAI API keys at startup with test API call\n  - Fails fast before expensive operations (downloads, processing)\n  - Clear error messages for invalid or missing keys\n  - Saves time and bandwidth by catching errors early\n- ✅ **VAD Performance Optimizations**\n  - Lightweight Silero VAD for speech boundary detection (1.8MB model)\n  - O(n²) → O(n) complexity reduction in boundary search\n  - Gap quality filtering (300ms minimum) to prevent mid-syllable cuts\n  - Accurate bitrate detection for optimal chunk sizing\n  - No HuggingFace token required for basic chunking\n  - 10-100x faster VAD processing for large files\n- ✅ **Automatic yt-dlp Updates**\n  - Automatic version checking before YouTube downloads\n  - Auto-updates yt-dlp to prevent HTTP 403 errors\n  - Keeps up with YouTube API changes\n\n### Version 1.5\n- ✅ **Speaker Diarization**\n  - Automatic speaker identification using pyannote.audio\n  - Speaker labels in transcripts ([SPEAKER_00], [SPEAKER_01], etc.)\n  - Works with both local Whisper and OpenAI API\n  - Optimal speaker detection using VAD integration\n  - Enable with `--speakers` flag\n  - ⚠️ Note: May over-segment speakers (one person → multiple labels); manual review recommended for critical use\n- ✅ **Enhanced logging with colored output**\n  - Color-coded log levels for better visibility\n  - WARNING messages in orange for important notices\n  - INFO messages in green for successful operations\n  - ERROR/CRITICAL messages in red for failures\n  - Smart warnings (e.g., missing Whisper prompt suggestions)\n\n### Version 1.4\n- ✅ **Video file support**\n  - Process local video files (MP4, MKV, AVI, MOV, etc.)\n  - Automatic audio extraction using FFmpeg\n  - Full pipeline support (transcribe, translate, refine)\n\n### Version 1.3\n- ✅ **Document processing** (.docx, .md, .txt, .pdf)\n  - Read existing transcripts\n  - **PDF support**\n  - Post-process text with an LLM\n  - Translate uploaded documents\n  - Automatic language detection\n- ✅ **Quality \u0026 testing**\n  - 139 automated tests with 49% coverage\n  - CI/CD powered by GitHub Actions\n  - Pre-commit hooks (black, flake8, mypy)\n  - Full type hints across the codebase\n\n### Version 1.1\n- ✅ **Optimised prompts for LLM polishing**\n  - Removes filler words (\"um\", \"uh\", etc.)\n  - Normalises numbers (\"twenty eight\" → \"28\")\n  - Preserves **all** facts and examples\n  - Works for both Russian and English content\n\n### Version 1.0\n- ✅ Downloading and processing YouTube videos\n- ✅ Processing local audio files (mp3, wav, ...)\n- ✅ Processing local video files (mp4, mkv, avi, ...)\n- ✅ Whisper-based transcription (base, small, medium)\n- ✅ LLM-based refinement through Ollama (qwen2.5, llama3, ...)\n- ✅ Automatic language detection (ru/en)\n- ✅ Translation with Meta NLLB\n- ✅ Export to .docx and .md\n- ✅ Custom Whisper prompts (from file)\n- ✅ Prompt generation from YouTube metadata\n- ✅ Rich logging and progress bars\n- ✅ Apple M1/M2 optimisations\n\n### In progress\n- 🔄 Optimized chunk processing for OpenAI API\n- 🔄 Batch processing support\n- 🔄 Docker support\n\n## 📋 Requirements\n\n### System\n- Python 3.9+\n- FFmpeg (audio preprocessing)\n- Ollama (optional, for LLM refinement)\n- 8 GB RAM minimum, 16 GB recommended\n- ~5 GB disk space for Whisper and NLLB models\n- Additional 3–7 GB if you use Ollama models\n\n### Supported platforms\n- macOS (including Apple Silicon)\n- Linux\n- Windows\n\n## 🚀 Installation\n\n### 1. Clone the repository\n\n```bash\ngit clone \u003crepository-url\u003e\ncd yt-transcriber\n```\n\n### 2. Create a virtual environment\n\n```bash\npython -m venv venv\n\n# macOS/Linux\nsource venv/bin/activate\n\n# Windows\nvenv\\Scripts\\activate\n```\n\n### 3. Install FFmpeg\n\n**macOS**\n```bash\nbrew install ffmpeg\n```\n\n**Linux (Ubuntu/Debian)**\n```bash\nsudo apt update\nsudo apt install ffmpeg\n```\n\n**Windows**\nDownload a build from [ffmpeg.org](https://ffmpeg.org/download.html) and add it to your `PATH`.\n\n### 4. Install Python dependencies\n\n```bash\npip install --upgrade pip\npip install -r requirements.txt\n```\n\n### 5. Install Ollama (optional, for refinement)\n\n**macOS/Linux**\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# Recommended models\nollama pull qwen2.5:3b    # Fast, good quality (~3 GB)\nollama pull qwen2.5:7b    # Slower, higher quality (~7 GB)\n\n# Start the server (if not already running)\nollama serve\n```\n\n**Windows**\nDownload the installer from [ollama.com](https://ollama.com/download).\n\n### 6. Environment variables (optional)\n\nCreate a `.env` file in the project root:\n\n```bash\n# Enable OpenAI integration (experimental)\nOPENAI_API_KEY=your_api_key_here\n\n# Logging level\nLOG_LEVEL=INFO\n```\n\n## 📖 Usage\n\n### Quick examples\n\n#### 1. Transcribe a YouTube video\n\n```bash\npython -m src.main youtube --url \"https://youtube.com/watch?v=dQw4w9WgXcQ\" --transcribe whisper-base\n```\n\n#### 2. Transcribe and translate\n\n```bash\npython -m src.main youtube \\\n    --url \"https://youtube.com/watch?v=dQw4w9WgXcQ\" \\\n    --transcribe whisper-base \\\n    --translate nllb\n```\n\n#### 3. Process a local audio file\n\n```bash\npython -m src.main audio \\\n    --input audio.mp3 \\\n    --transcribe whisper-medium \\\n    --translate nllb\n```\n\n#### 4. Process a local video file\n\n```bash\npython -m src.main video \\\n    --input video.mp4 \\\n    --transcribe whisper-medium \\\n    --translate nllb\n```\n\nSupported video formats: MP4, MKV, AVI, MOV, and any format supported by FFmpeg.\n\n#### 5. Refine a transcript with an LLM\n\n```bash\npython -m src.main audio \\\n    --input audio.mp3 \\\n    --transcribe whisper-medium \\\n    --refine-model qwen2.5:7b \\\n    --translate nllb\n```\n\nProduces two documents:\n- `audio_original.docx/md` — raw transcript without translation\n- `audio_refined.docx/md` — polished transcript with translation\n\nAdd LLM polish for the translation as well (Ollama backend):\n```bash\npython -m src.main audio \\\n    --input audio.mp3 \\\n    --transcribe whisper-medium \\\n    --refine-model qwen2.5:7b \\\n    --translate nllb \\\n    --refine-translation qwen2.5:3b\n```\n\nUse OpenAI GPT-4o Mini for refinement (requires `OPENAI_API_KEY`):\n```bash\npython -m src.main audio \\\n    --input audio.mp3 \\\n    --transcribe whisper-medium \\\n    --refine-backend openai-api \\\n    --refine-model gpt-4o-mini-2024-07-18\n```\n`gpt-4o-mini` is an alias; the full dated ID keeps you on a fixed model version.\n\n#### 6. Use a custom Whisper prompt\n\n```bash\n# Create prompt.txt with project-specific terms\n# FIDE, Hikaru Nakamura, Magnus Carlsen, chess tournament\n\npython -m src.main youtube \\\n    --url \"https://youtube.com/watch?v=YOUR_VIDEO_ID\" \\\n    --transcribe whisper-base \\\n    --prompt-file prompt.txt\n```\n\n#### 7. Enable speaker diarization (v1.5)\n\n```bash\n# Transcribe with automatic speaker identification\npython -m src.main youtube \\\n    --url \"https://youtube.com/watch?v=YOUR_VIDEO_ID\" \\\n    --transcribe whisper-medium \\\n    --speakers\n```\n\n**Requirements for speaker diarization:**\n1. Get HuggingFace token: https://huggingface.co/settings/tokens (create a \"Read\" token)\n2. Accept model terms for all required models:\n   - https://huggingface.co/pyannote/speaker-diarization-3.1\n   - https://huggingface.co/pyannote/segmentation-3.0\n   - https://huggingface.co/pyannote/speaker-diarization-community-1\n   - https://huggingface.co/pyannote/voice-activity-detection (optional, for better chunking)\n3. Set token in environment: `export HF_TOKEN=your_token_here` (add to `~/.zshrc` or `~/.bashrc`)\n\nOutput will include speaker labels:\n```\n[00:00] [SPEAKER_00] Hello everyone, welcome to the show\n[00:05] [SPEAKER_01] Thanks for having me\n[00:08] [SPEAKER_00] Let's get started with today's topic\n```\n\n## ⚖️ Legal notice\n- Make sure you respect YouTube Terms of Service and copyright law before downloading or processing any content. Only use the tool for media you own or have explicit permission to process.\n- Output documents and logs may contain fragments of the original content. Store them locally and review licences before sharing.\n- The default translation model `facebook/nllb-200-distilled-1.3B` is released under CC BY-NC 4.0 (non-commercial). Use a different model or obtain a licence for commercial scenarios.\n\n#### 8. Process existing documents (v1.2)\n\n```bash\n# Improve an existing transcript\npython -m src.main text --input output/document.md --refine-model qwen2.5:7b\n\n# Translate a document\npython -m src.main text --input transcription.docx --translate nllb\n\n# Refine and translate\npython -m src.main text --input document.txt --refine-model qwen2.5:7b --translate nllb\n```\n\nSupported formats: `.md`, `.docx`, `.txt`\n\n#### 9. Help screen\n\n```bash\npython -m src.main --help\npython -m src.main youtube --help\npython -m src.main audio --help\n```\n\n### CLI structure\n\n```bash\npython -m src.main \u003ccommand\u003e [options]\n```\n\n**Commands:**\n- `youtube` — Process a YouTube video\n- `audio` — Process a local audio file\n- `video` — Process a local video file\n- `text` — Process a text document\n\n### Common options\n\n| Option | Description | Example |\n|--------|-------------|---------|\n| `--transcribe` | Transcription method | `--transcribe whisper-base` |\n| `--translate` | Translation method | `--translate nllb` |\n| `--refine-model` | Model for refinement | `--refine-model qwen2.5:7b` |\n| `--refine-backend` | Backend for **transcript** refinement (not translation) | `--refine-backend ollama` |\n| `--prompt-file` | Custom Whisper prompt file | `--prompt-file prompt.txt` |\n| `--nllb-model` | NLLB model override | `--nllb-model facebook/nllb-200-distilled-600M` |\n| `--refine-translation` | LLM polish for the translated text (Ollama) | `--refine-translation qwen2.5:3b` |\n| `--speakers` | Enable speaker diarization | `--speakers` |\n| `--summarize-model` | Model for summarization | `--summarize-model qwen2.5:7b` |\n| `--help` | Show help | `--help` |\n\n**Note:** `--refine-backend` only switches the backend for transcript refinement (`--refine-model`). Translation polishing uses `--refine-translation` and the Ollama backend.\n\n### Available methods\n\n**Transcription**\n- `whisper-base` — fast, good quality\n- `whisper-small` — slower, higher quality\n- `whisper-medium` — slowest, best quality\n- `whisper-openai-api` — OpenAI Whisper API (requires OPENAI_API_KEY)\n- `gigaam-e2e-rnnt` — GigaAM v3 (RU), максимальное качество + пунктуация/нормализация\n- `gigaam-e2e-ctc` — GigaAM v3 (RU), быстрее, чуть проще модели\n\n**Refinement (requires Ollama or OpenAI API)**\n- `qwen2.5:3b` — fast, 3 GB (recommended)\n- `qwen2.5:7b` — slower, better quality\n- `llama3.2:3b` — fast, solid quality\n- `llama3:8b` — slower, higher quality\n- `mistral:7b` — balanced\n- `gpt-4o-mini-2024-07-18` — OpenAI GPT-4o Mini (API; alias `gpt-4o-mini` also works)\n- Any other model available in the [Ollama library](https://ollama.com/library)\n\n**Translation**\n- `nllb` — Meta NLLB (local, free)\n- `openai-api` — OpenAI GPT API (requires OPENAI_API_KEY)\n\n## 📁 Project structure\n\n```\nyt-transcriber/\n├── src/                      # Source code\n│   ├── main.py              # Entry point\n│   ├── config.py            # Configuration\n│   ├── downloader.py        # YouTube downloads\n│   ├── transcriber.py       # Transcription\n│   ├── text_reader.py       # Text ingestion\n│   ├── translator.py        # Translation\n│   ├── text_refiner.py      # LLM-based refinement\n│   ├── document_writer.py   # Document generation\n│   ├── utils.py             # Utilities\n│   └── logger.py            # Logging setup\n├── tests/                   # Automated tests\n├── output/                  # Generated docs\n├── temp/                    # Temporary files\n├── logs/                    # Logs\n├── requirements.txt         # Runtime dependencies\n├── .env.example             # Sample configuration\n└── README.md                # Documentation\n```\n\n**Note:** Whisper and NLLB models are cached in `~/.cache/` on first run.\n\n## 🔧 Configuration\n\nMain settings live in `src/config.py`:\n\n```python\n# Paths\nOUTPUT_DIR = \"output\"        # Output folder\nTEMP_DIR = \"temp\"            # Temporary files\nLOGS_DIR = \"logs\"            # Logs\n\n# Models\nWHISPER_DEVICE = \"mps\"       # cpu/cuda/mps (auto-switch for M1)\nNLLB_MODEL_NAME = \"facebook/nllb-200-distilled-600M\"\n\n# Logging\nLOG_LEVEL = \"INFO\"           # DEBUG/INFO/WARNING/ERROR\n```\n\n## 📊 Performance\n\nApproximate processing time on a MacBook Air M1 (16 GB, CPU):\n\n| Video length | whisper_base | whisper_small | NLLB translation | Total (base+translate) | Total (small+translate) |\n|--------------|--------------|---------------|------------------|------------------------|-------------------------|\n| 3 minutes    | ~11 s        | ~34 s         | ~1.5 min         | ~2 min                 | ~3 min                  |\n| 10 minutes   | ~36 s        | ~2 min        | ~5 min           | ~5.5 min               | ~7 min                  |\n| 30 minutes   | ~1.8 min     | ~5.7 min      | ~14 min          | ~16 min                | ~20 min                 |\n| 1 hour       | ~3.6 min     | ~11 min       | ~28 min          | ~32 min                | ~39 min                 |\n| 2 hours      | ~7 min       | ~23 min       | ~56 min          | ~63 min                | ~79 min                 |\n\n**Processing factors:**\n- Whisper Base: 0.06× (≈16× faster than realtime) 🚀\n- Whisper Small: 0.19× (≈5× faster than realtime)\n- NLLB: 0.47× (≈2× faster than realtime)\n\n## 🐛 Troubleshooting\n\n### Installation issues\n\n**Problem:** `torch` fails to install on Apple Silicon\n```bash\n# Use the dedicated Apple Silicon build\npip install --upgrade torch torchvision torchaudio\n```\n\n**Problem:** FFmpeg not found\n```bash\nffmpeg -version\n# If missing, install via Homebrew (macOS)\nbrew install ffmpeg\n```\n\n**Problem:** Out of memory\n```bash\n# Switch to a smaller Whisper model\npython -m src.main --url \"...\" --transcribe whisper_base\n```\n\n### Runtime issues\n\n**Problem:** `Model not found`\n- Models download automatically on first run\n- Ensure you have an internet connection\n- Check that the `models/` directory is writable\n\n**Problem:** Processing is slow\n- Use `whisper_base` instead of `whisper_small`\n- Confirm that GPU/MPS acceleration is active (see logs)\n- Close other resource-heavy applications\n\n**Safe to ignore:** Speaker diarization warnings\n- `UserWarning: torchcodec is not installed correctly` — Audio loading uses soundfile/librosa fallback (works correctly)\n- `UserWarning: std(): degrees of freedom is \u003c= 0` — Internal pyannote calculation (does not affect results)\n- `UserWarning: Lightning automatically upgraded your loaded checkpoint` — PyTorch Lightning version compatibility (does not affect results)\n- See [FAQ.md](FAQ.md) for detailed explanations\n\n### VAD Performance Optimization\n\nThe tool now uses **Silero VAD** by default for speech boundary detection when splitting large audio files. Silero VAD offers:\n- **Faster processing**: ~1ms per 30ms chunk (CPU-based)\n- **Smaller footprint**: 1.8MB model vs pyannote's full diarization stack\n- **No HuggingFace token required** for basic VAD functionality\n- **Automatic fallback** to pyannote VAD if Silero is unavailable\n\n**For optimal performance:**\n1. Silero VAD is used automatically (no setup needed)\n2. PyAnnote VAD is still available for speaker diarization (`--speakers` flag)\n3. To eliminate PyTorch Lightning warnings from pyannote, upgrade checkpoints once:\n   ```bash\n   python -c \"from lightning.pytorch.cli import LightningCLI; LightningCLI(run=False)\"\n   ```\n\n**Performance improvements in v1.6:**\n- ✅ O(n²) → O(n) complexity in boundary search\n- ✅ Gap quality filtering (300ms minimum, prevents mid-syllable cuts)\n- ✅ Accurate bitrate detection for chunk size estimation\n- ✅ Lightweight Silero VAD by default\n\n## 🧪 Testing\n\n```bash\n# Install dev dependencies\npip install -r requirements-dev.txt\n\n# Run tests\npytest tests/\n\n# Coverage report\npytest --cov=src tests/\n```\n\n## 📝 Sample output\n\n### .docx format\n```\n# Video title\n\n## Translation\nMethod: NLLB\n\n[00:15] Hello everyone! Today we will talk about...\n\n[01:32] The first important topic is...\n\n## Transcript\nMethod: whisper_base\n\n[00:15] Hello everyone! Today we'll talk about...\n\n[01:32] The first important topic is...\n```\n\n### .md format\nUses the same layout with Markdown syntax.\n\n## 🛣️ Roadmap\n\n### v1.0 — ✅ Shipped\n- ✅ YouTube + local audio ingestion\n- ✅ Whisper (base, small, medium)\n- ✅ LLM-based refinement via Ollama\n- ✅ NLLB translation\n- ✅ Custom prompts\n- ✅ Automatic language detection\n\n### v2.0 — Planned\n- [ ] Extended document ingestion\n- [ ] OpenAI API integration\n- [ ] Speaker diarisation\n- [ ] Enhanced CI/CD + unit tests\n- [ ] Docker image\n- [ ] Web UI\n- [ ] Batch processing helpers\n\n## 🤝 Contributing\n\nPull requests are welcome! For major changes, open an issue first to discuss what you would like to improve.\n\n### Development flow\n\n1. Fork the repository\n2. Create a branch (`git checkout -b feature/amazing-feature`)\n3. Commit (`git commit -m 'Add amazing feature'`)\n4. Push (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## 📄 License\n\n- Distributed under the MIT License — see `LICENSE` for details.\n- The codebase was developed with help from AI-assisted tools (e.g., GitHub Copilot, Codex). All code and docs were reviewed and validated manually before publishing.\n\n## 🙏 Acknowledgements\n\n- [OpenAI Whisper](https://github.com/openai/whisper) — transcription\n- [Meta NLLB](https://github.com/facebookresearch/fairseq/tree/nllb) — translation\n- [Ollama](https://ollama.com) — local LLMs\n- [yt-dlp](https://github.com/yt-dlp/yt-dlp) — YouTube downloads\n\n## 📞 Contact\n\nFor questions or suggestions, please open an issue in this repository.\n\n---\n\n## 💡 Usage tips\n\n### Improve transcription quality\n1. Use `whisper_medium` for critical content\n2. Provide prompt files with key terms and names\n3. For YouTube sources, metadata-derived prompts are added automatically\n\n### Improve text quality\n1. Install Ollama and pull `qwen2.5:7b` for best results\n2. Language detection switches between Russian and English automatically\n3. Use `--refine-model` to produce a clean transcript\n4. **New in v1.1:** the LLM prompt\n   - Removes filler words (\"um\", \"uh\", \"эм\", \"ну\", \"вот\")\n   - Skips meta commentary (\"let me scroll\", \"сейчас открою экран\")\n   - Normalises numbers (\"twenty eight sixteen\" → \"2816\", \"ноль восемь\" → \"0.8\")\n   - **Keeps every detail**: examples, facts, reasoning\n   - No summarisation — only clean-up and structuring\n   - Fixes punctuation and paragraphing\n\n### Optimise speed\n- `whisper_base` — high throughput\n- `whisper_medium` — best accuracy\n- `qwen2.5:3b` — fast refinement\n- `qwen2.5:7b` — highest quality\n\n### Model cache locations\n- Whisper: `~/.cache/whisper/` (~140 MB – 1.5 GB)\n- Ollama: manage via `ollama list` and `ollama rm \u003cmodel\u003e`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiyachuev%2Fyt-transcriber","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbiyachuev%2Fyt-transcriber","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbiyachuev%2Fyt-transcriber/lists"}