https://github.com/dev-khant/tell-what-a-video-does

Know what a youtube video is about with the help of LLMs.
https://github.com/dev-khant/tell-what-a-video-does

huggingface llm streamlit

Last synced: 28 days ago
JSON representation

Know what a youtube video is about with the help of LLMs.

Host: GitHub
URL: https://github.com/dev-khant/tell-what-a-video-does
Owner: Dev-Khant
Created: 2023-08-11T10:37:55.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-15T13:11:15.000Z (almost 2 years ago)
Last Synced: 2025-04-08T03:43:24.369Z (3 months ago)
Topics: huggingface, llm, streamlit
Language: Python
Homepage:
Size: 29.3 KB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Video Understanding and Q&A Tool

This project allows you to input a YouTube video link, and it provides a comprehensive understanding of the video's content through audio transcription and image captioning. **LLM** is used to combine audio and video context. Additionally, you can ask questions and it will provide responses according to video content 🚀

## Features ✨

👉 **Video Understanding**: The tool utilizes the **Transformer** model for audio transcription, converting spoken words into textual format. It also employs image captioning techniques to extract text from images within the video. **Image embeddings** are also used to compare images and only use images unique for extracting info. Video and Audio are processed **parallelly**.

👉 **Question & Answer**: Users can ask questions about the video's content. The tool leverages the power of **Chromadb** as a vector database to provide accurate and contextually relevant answers.

## How to Use ⚙️

• Clone this repository: `git clone https://github.com/Dev-Khant/tell-what-a-video-does.git`

• Install the required dependencies: `pip install -r requirements.txt`

• Run the streamlit app: `streamlit run app.py`

• Provide **YouTube video** with your **OpenAI token**, **Huggingface token**, **SerpAPI token**

## Technical 🖥️

• [Hugging Face](https://huggingface.co/): Utilized to access the OpenAI Whisper model for audio transcription.

• [SerpApi](https://serpapi.com/): Used it to access Google Lens API for getting image information.

• [Streamlit](https://streamlit.io/): Used to create the interactive web interface for the project.

• [Chromadb](https://www.trychroma.com/): The vector database used for storing and retrieving Q&A information.

## Work in Progress 🚧

1. Add Weaviate and let the user select their VectorDB.
2. Internet access to chatbot.
3. Option to upload video.
4. Store video explanations so they can be used later.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dev-khant/tell-what-a-video-does

Awesome Lists containing this project

README