Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dev-khant/tell-what-a-video-does
Know what a youtube video is about with the help of LLMs.
https://github.com/dev-khant/tell-what-a-video-does
huggingface llm streamlit
Last synced: about 1 month ago
JSON representation
Know what a youtube video is about with the help of LLMs.
- Host: GitHub
- URL: https://github.com/dev-khant/tell-what-a-video-does
- Owner: Dev-Khant
- Created: 2023-08-11T10:37:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-15T13:11:15.000Z (over 1 year ago)
- Last Synced: 2024-10-14T07:34:22.505Z (2 months ago)
- Topics: huggingface, llm, streamlit
- Language: Python
- Homepage:
- Size: 29.3 KB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Video Understanding and Q&A Tool
This project allows you to input a YouTube video link, and it provides a comprehensive understanding of the video's content through audio transcription and image captioning. **LLM** is used to combine audio and video context. Additionally, you can ask questions and it will provide responses according to video content 🚀
## Features ✨
👉 **Video Understanding**: The tool utilizes the **Transformer** model for audio transcription, converting spoken words into textual format. It also employs image captioning techniques to extract text from images within the video. **Image embeddings** are also used to compare images and only use images unique for extracting info. Video and Audio are processed **parallelly**.
👉 **Question & Answer**: Users can ask questions about the video's content. The tool leverages the power of **Chromadb** as a vector database to provide accurate and contextually relevant answers.
## How to Use ⚙️
• Clone this repository: `git clone https://github.com/Dev-Khant/tell-what-a-video-does.git`
• Install the required dependencies: `pip install -r requirements.txt`
• Run the streamlit app: `streamlit run app.py`
• Provide **YouTube video** with your **OpenAI token**, **Huggingface token**, **SerpAPI token**
## Technical 🖥️
• [Hugging Face](https://huggingface.co/): Utilized to access the OpenAI Whisper model for audio transcription.
• [SerpApi](https://serpapi.com/): Used it to access Google Lens API for getting image information.
• [Streamlit](https://streamlit.io/): Used to create the interactive web interface for the project.
• [Chromadb](https://www.trychroma.com/): The vector database used for storing and retrieving Q&A information.
## Work in Progress 🚧
1. Add Weaviate and let the user select their VectorDB.
2. Internet access to chatbot.
3. Option to upload video.
4. Store video explanations so they can be used later.