{"id":26896905,"url":"https://github.com/wassim249/YT-Navigator","last_synced_at":"2025-04-01T04:02:24.609Z","repository":{"id":282016096,"uuid":"944988714","full_name":"wassim249/YT-Navigator","owner":"wassim249","description":"YT Navigator: AI-powered YouTube content explorer that lets you search and chat with channel videos using AI agents. Extract insights from hours of content in seconds with semantic search and precise timestamps.","archived":false,"fork":false,"pushed_at":"2025-03-27T11:18:54.000Z","size":1335,"stargazers_count":362,"open_issues_count":2,"forks_count":48,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-27T12:28:17.621Z","etag":null,"topics":["agentic-ai","agentic-rag","ai","django","langchain","langgraph","llm","python","rag","reranking","youtube","youtube-bot"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wassim249.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-08T11:52:11.000Z","updated_at":"2025-03-27T11:22:40.000Z","dependencies_parsed_at":"2025-03-12T11:23:07.157Z","dependency_job_id":"64d2f21a-ea3c-42fd-8e98-31706bbb3651","html_url":"https://github.com/wassim249/YT-Navigator","commit_stats":null,"previous_names":["wassim249/yt-navigator"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wassim249%2FYT-Navigator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wassim249%2FYT-Navigator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wassim249%2FYT-Navigator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wassim249%2FYT-Navigator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wassim249","download_url":"https://codeload.github.com/wassim249/YT-Navigator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246333931,"owners_count":20760638,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","agentic-rag","ai","django","langchain","langgraph","llm","python","rag","reranking","youtube","youtube-bot"],"created_at":"2025-04-01T04:02:22.682Z","updated_at":"2025-04-01T04:02:24.600Z","avatar_url":"https://github.com/wassim249.png","language":"Python","funding_links":[],"categories":["🎥 Media \u0026 Podcasts","Python"],"sub_categories":["🟩 Development Tools 🛠️"],"readme":"# 🔴 YT Navigator\n\n![YT Navigator Home Page](./images/home.png)\n\n## 📋 Overview\n\nYT Navigator is an AI-powered application that helps you navigate and search through YouTube channel content efficiently. Instead of manually watching hours of videos to find specific information, YT Navigator allows you to:\n\n1. **🔍 Search through a channel's videos** using natural language queries\n2. **💬 Chat with a channel's content** to get answers based on video transcripts\n3. **⏱️ Discover relevant video segments** with precise timestamps\n\nPerfect for researchers, students, content creators, or anyone who needs to extract information from YouTube channels quickly.\n\n## ✨ Main Features\n\n- **🔐 Authentication**: Secure login and independent sessions\n- **📺 Channel Management**: Scan up to 100 videos per channel and get a summary of the channel\n- **🔍 Search**: Find relevant video segments using Semantic Search\n- **💬 Chat**: Have conversations with an AI that has knowledge of the channel's content\n\n### 1- 📥 Channel data retrieval\n\n![Channel data retrieval](./images/scan.png)\nFor this part, the user enters a YouTube channel URL which the system validates before extracting the channel username. The system then fetches channel details including title, description, and profile picture, storing them in the database.\n\nAfter connecting to a channel, the user selects how many videos to scan (up to 100). The system then processes these videos in parallel through two paths:\n1. 📊 Video metadata is extracted and saved to a relational database (PostgreSQL)\n2. 📝 Video transcripts are extracted, split into segments, converted to vector embeddings, and stored in a vector database (PGVector)\n\nOnce both processes are complete, the channel content becomes available for search and chat functionality.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eClick to show/hide the Channel Data Retrieval Flow Diagram\u003c/strong\u003e\u003c/summary\u003e\n\n```mermaid\ngraph TD\n    A[User enters YouTube Channel URL] --\u003e B[Validate URL]\n    B --\u003e C[Fetch Channel Details]\n    C --\u003e G[User selects number of videos to scan]\n    G --\u003e H[Fetch Video Details]\n    H --\u003e I[Process Video Metadata]\n\n    H --\u003e J[Extract Video Transcripts]\n    I --\u003e K1[Save to Relational Database]\n    J --\u003e L[Split into Video Segments]\n    L --\u003e M[Generate Embeddings]\n    M --\u003e K2[Add to Vector Database]\n    K1 --\u003e N[Channel Ready for Search/Chat]\n    K2 --\u003e N\n```\n\u003c/details\u003e\n\n### 2 - 🔍 Querying the channel\n![Querying the channel](./images/query.png)\n\nThe querying process begins when a user enters a natural language query to search across the channel's content. The system processes this query through both semantic search (using vector embeddings) and keyword search (using BM25) for comprehensive results. These results are combined, enriched with video metadata from the relational database, and deduplicated. A cross-encoder model then reranks the results based on relevance to the query. The system standardizes relevance scores, groups results by video, and returns the most relevant videos along with specific transcript segments. The user interface displays these results with video thumbnails, titles, relevant transcript segments, and direct links to the exact timestamps in the videos where the information appears.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eClick to show/hide the Query Flow Diagram\u003c/strong\u003e\u003c/summary\u003e\n\n```mermaid\ngraph TD\n    A[User enters natural language query] --\u003e D1[Perform semantic search]\n    A --\u003e D2[Perform keyword search]\n    D1 --\u003e E[Combine search results]\n    D2 --\u003e E\n    E --\u003e F[Fetch video metadata]\n    F --\u003e H[Remove duplicates]\n    H --\u003e I[Rerank results]\n    I --\u003e J[Standardize scores]\n    J --\u003e L[Return top videos and segments]\n```\n\u003c/details\u003e\n\n### 3 - 💬 Chat with the channel\n![Chat with the channel](./images/chat.png)\n\nThe chat interface facilitates interactive conversations with an AI agent knowledgeable about the channel's content, utilizing the ReAct framework. When a user sends a message, the system processes it through a decision-making mechanism to identify the appropriate response type. The message can be addressed in three ways:\n1) 🔄 A direct response without tool calls for general inquiries,\n2) ⛔ A static response for irrelevant questions,\n3) 🛠️ A tool-assisted response that queries the vector database to extract specific information from video transcripts. In the case of tool-assisted responses, the agent engages in a cycle where it employs its tools (semantic search and SQL Select query execution) to gather information before crafting a comprehensive answer.\n\nThis process mitigates hallucinations and allows for the use of smaller models in handling complex tasks.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eClick to show/hide the Chat Flow Diagram\u003c/strong\u003e\u003c/summary\u003e\n\n```mermaid\ngraph TD\n    A[__start__] --\u003e B[route_message\n    llama-3.1-8b-instant]\n\n    B -.-\u003e C[non_tool_calls_reply\n    llama-3.1-8b-instant]\n    B -.-\u003e D[static_not_relevant_reply\n    llama-3.1-8b-instant]\n    B -.-\u003e E[tool_calls_reply\n    qwen-qwq-32b]\n\n    subgraph React Agent qwen-qwq-32b\n        E1[__start__] --\u003e E2[agent]\n        E2 -.continue.-\u003e E3[tools]\n        E2 -.end.-\u003e E4[__end__]\n        E3 --\u003e E2\n    end\n\n    C --\u003e F[__end__]\n    D --\u003e F\n    E --\u003e F\n```\n\u003c/details\u003e\n\n### 4. Agent Workflow Diagram\n\n![Agent Workflow Diagram](./images/agent_workflow.jpg)\n\n\n## 🧰 Technology Stack\n\n- **🖥️ Backend**:\n  - Django (Python)\n  - PostgreSQL\n  - Structlog for logging\n  - Pydantic for data validation\n- **🧠 AI \u0026 ML**:\n  - [LangGraph](https://www.langchain.com/langgraph) for conversational AI\n  - [Sentence Transformers](https://www.sentence-transformers.org/) for semantic search\n  - [PGVector](https://www.pgvector.org/) as a vector database\n  - [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) for keyword search\n  - [bge-small-en-v1.5](https://huggingface.co/sentence-transformers/BAAI/bge-small-en-v1.5) for embeddings\n  - qwen-qwq-32b and llama-3.1-8b-instant from [Groq](http://groq.com/)\n- **⚙️ Data Processing**:\n  - [Scrapetube](https://github.com/dermasmid/scrapetube) for scraping videos\n  - [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) for obtaining transcripts\n- **🎨 Frontend**:\n  - Django templates with modern CSS\n  - Responsive design\n\n## 🚀 Installation\n\n### 💻 Without Docker\n\n1. Clone the repository\n```bash\ngit clone https://github.com/wassim249/YT-Navigator\n```\n\n2. Create a virtual environment and install dependencies\n```bash\npython -m venv venv\nsource venv/bin/activate\npip install -e .\n```\n\n3. Make sure you have a PostgreSQL database running.\n\n4. Create a `.env` file in the root directory from the `.env.example` file.\n```bash\ncp .env.example .env\n```\n\n5. Create Django migrations and migrate the database\n```bash\npython manage.py migrate\n```\n\n6. Run the development or production server\n```bash\nmake dev # for development\nmake prod # for production\n```\n### 🐳 With Docker\n1. Create a `.env` file in the root directory from the `.env.example` file (Make sure you set *POSTGRES_HOST=db*).\n```bash\ncp .env.example .env\n```\n\n2. Build the Docker image\n```bash\nmake build-docker\n```\n\n3. Run the Docker container\n```bash\nmake run-docker\n```\n## 📖 Usage\n\n### 1. 📝 Register and Login\n\nCreate an account to get started.\n\n### 2. 🔗 Connect a YouTube Channel\n\nOn the home page, enter a YouTube channel URL to connect to it. The system will fetch the channel's information.\n\n### 3. 📥 Scan Videos\n\nAfter connecting a channel, you can scan its videos. Choose how many videos to scan (more videos = more comprehensive results but longer processing time).\n\n### 4. 🔍 Search for Information\n\nUse the search feature to find specific information across all scanned videos. The system will return:\n- 🎯 Relevant video segments with timestamps\n- 📝 Transcripts of the matching content\n- 🔗 Links to watch the videos at the exact timestamps\n\n### 5. 💬 Chat with the Channel\n\nUse the chatbot interface to have a conversation about the channel's content. The AI will respond based on the information in the scanned videos.\n\n## 👨‍💻 Development\n\n### 📁 Project Structure\n\n- `app/`: Main Django application\n  - `models/`: Database models (Channel, Video, VideoChunk)\n  - `views/`: View functions for web pages and API endpoints\n  - `services/`: Core functionality (scraping, vector database, AI agent)\n  - `templates/`: HTML templates\n  - `static/`: CSS, JavaScript, and other static files\n\n- `yt_navigator/`: Django project settings and configuration\n\n### 🛠️ Using the Makefile\n\nThe project includes a Makefile with useful commands:\nRun `make help` to see the available commands.\n```bash\nmake help\n```\n\n## 🗺️ Roadmap\n\n- [ ] 🐳 Add Docker support\n- [ ] ✅ Add tests\n- [ ] 📋 Add support for playlist/shorts scanning\n- [ ] 📱 Improve mobile experience\n- [ ] 🌐 Add support for multiple languages\n\n## 🤝 Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## 🤵 Author\n\n- [wassim249](https://github.com/wassim249)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwassim249%2FYT-Navigator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwassim249%2FYT-Navigator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwassim249%2FYT-Navigator/lists"}