https://github.com/tameronline/tol_vido
https://github.com/tameronline/tol_vido
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tameronline/tol_vido
- Owner: TamerOnLine
- License: mit
- Created: 2025-07-15T23:13:42.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-24T13:49:48.000Z (11 months ago)
- Last Synced: 2025-07-24T18:05:11.160Z (11 months ago)
- Language: Python
- Size: 66.3 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🎬 tol_vido – YouTube to Clean Text (LLM Powered)
[](https://github.com/TamerOnLine/pro_venv/actions)
[](LICENSE)
`tol_vido` is a powerful, local-first Python tool that transcribes YouTube videos into well-formatted German paragraphs using Large Language Models (LLMs) like **Mistral** via `.gguf`. It's fast, customizable, and runs entirely on your machine.
---
## 🚀 Features
- ✅ Download and convert YouTube audio with `yt-dlp`
- ✅ Segment and transcribe speech with `SpeechRecognition` (Google)
- ✅ Format the raw text using a local LLM (e.g. Mistral via `llama-cpp-python`)
- ✅ Full local processing — **no cloud required**
- ✅ Modular and extensible codebase
- ✅ Auto-setup: project config, `venv`, VS Code files, and more
---
## 🧱 Project Structure
```bash
tol_vido/
├── 01_pro_venv.py # Setup: venv, config, VS Code
├── 02_start_llm_server.py # Run local LLM server
├── 03_main.py # Entry point runner
├── app.py # Main processing logic
├── extract_text_from_video.py # Transcribe audio from video
├── format_text.py # Format using local LLM
├── output.txt # Raw transcription result
├── formatted_output.txt # Final cleaned-up version
├── setup-config.json # Project metadata
├── requirements.txt # Dependencies
├── env-info.txt # Python & installed packages
├── tools/
│ └── download_model.py # GGUF model downloader
└── .github/workflows/ # GitHub Actions
```
---
## ▶️ Getting Started
### 1. Clone the repository
```bash
git clone https://github.com/YourUsername/tol_vido.git
cd tol_vido
```
### 2. Run the setup script
```bash
python 01_pro_venv.py
```
This will:
- Create a virtual environment
- Install dependencies
- Generate `main.py`, `app.py`, `.vscode/` settings
### 3. Start the LLM server
### ⚠️ Important: Pre-Run Setup
Before running the main script, make sure the following steps are completed:
#### 📁 1. Download and place FFmpeg binaries
This project requires FFmpeg tools for audio conversion.
- Download from: [https://www.gyan.dev/ffmpeg/builds/](https://www.gyan.dev/ffmpeg/builds/)
- Extract and copy the following files into a folder named `bin/` in the project root:
```bash
bin/
├── ffmpeg.exe
├── ffprobe.exe
├── ffplay.exe # (optional)
```
> You **do not need** to add FFmpeg to your system PATH. The project uses them directly from `./bin/`.
---
#### 📄 2. Create `.env` file
In the project root, create a file named `.env` with the following content:
```env
# Required for downloading and running the model
MODEL_NAME=mistral-7b-instruct-v0.1.Q4_K_M.gguf
# Local path to the GGUF model file (adjust this path to match your setup)
MODEL=r:/tol_vido/models/mistral-7b-instruct-v0.1.Q4_K_M.gguf
# LLM Server runtime parameters
N_CTX=4096
N_THREADS=8
N_GPU_LAYERS=35
VERBOSE=true
```
These values are used by:
- `tools/download_model.py` – to fetch the model from Hugging Face
- `02_start_llm_server.py` – to launch the local LLM inference server
```bash
python 02_start_llm_server.py
```
> Make sure the `.gguf` model exists in the `models/` folder. You can use `tools/download_model.py` to fetch it from Hugging Face.
### 4. Run the full pipeline
```bash
python 03_main.py
```
You’ll be prompted for a YouTube URL. After processing, the result will be saved in:
- `output.txt`: raw transcription
- `formatted_output.txt`: structured, punctuated paragraphs
---
## 📦 Requirements
---
## 🌐 Model Setup (GGUF)
### 2. Download the model
```bash
python tools/download_model.py
```
Model will be downloaded from:
> [TheBloke/Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF)
---
## 💡 Notes
- All audio processing and formatting are local.
- Transcription uses `Google Speech Recognition` — no API key required.
- Formatting uses a **local** LLM server (default: `localhost:11434`).
- Works offline once the model is downloaded.
---
## 🛠 VS Code Support
The setup script automatically generates:
- `.vscode/settings.json` → sets interpreter path
- `.vscode/launch.json` → debugger config
- `project.code-workspace` → pre-configured workspace
---
## 📜 License
This project is licensed under the [MIT License](LICENSE).
You are free to use, modify, and distribute it with attribution.
Feel free to explore and build upon it!
---
## 🙌 Acknowledgements
This project would not have been possible without the following open-source tools and libraries. Sincere thanks to their developers and contributors:
- [Mistral 7B](https://mistral.ai) – for providing the foundational language model.
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) – for efficient local inference of LLMs.
- [yt-dlp](https://github.com/yt-dlp/yt-dlp) – for reliable YouTube media extraction.
- [tiktoken](https://github.com/openai/tiktoken) – for fast and accurate tokenization.
---
## 👨💻 About the Author
🎯 **Tamer OnLine – Developer & Architect**
A dedicated software engineer and educator with a focus on building multilingual, modular, and open-source applications using Python, Flask, and PostgreSQL.
🔹 Founder of **Flask University** – an initiative to create real-world, open-source Flask projects
🔹 Creator of [@TamerOnPi](https://www.youtube.com/@mystrotamer) – a YouTube channel sharing tech, tutorials, and Pi Network insights
🔹 Passionate about helping developers learn by building, one milestone at a time
Connect or contribute:
[](https://github.com/TamerOnLine)
[](https://www.linkedin.com/in/tameronline/)
[](https://www.youtube.com/@mystrotamer)