https://github.com/praths71018/video_text_summarisation_and_prompting

An application where you upload a video in the webapp and it provides a summary of the video. You can also interact with the video by prompting
https://github.com/praths71018/video_text_summarisation_and_prompting

audio-transcription chatbot flask image-capture langchain llm nlp-machine-learning prompt-engineering prompting question-answering rag reactjs summarization transcript translation video-gpt video-processing

Last synced: 3 months ago
JSON representation

An application where you upload a video in the webapp and it provides a summary of the video. You can also interact with the video by prompting

Host: GitHub
URL: https://github.com/praths71018/video_text_summarisation_and_prompting
Owner: praths71018
Created: 2024-10-21T03:01:29.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-30T10:23:11.000Z (over 1 year ago)
Last Synced: 2025-03-24T15:31:21.797Z (over 1 year ago)
Topics: audio-transcription, chatbot, flask, image-capture, langchain, llm, nlp-machine-learning, prompt-engineering, prompting, question-answering, rag, reactjs, summarization, transcript, translation, video-gpt, video-processing
Language: Jupyter Notebook
Homepage:
Size: 31 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# **GlobalLearn: A VideoGPT**

## **Table of Contents**
1. [Introduction](#introduction)
2. [Features](#features)
3. [Methodology](#methodology)
4. [Results](#results)
5. [Installation and Usage](#installation-and-usage)
6. [Contributors](#contributors)
7. [Demo Video](#Demo-Video)

---

## **Introduction**
GlobalLearn is an educational application that **summarizes educational videos** into text format in **multiple languages** and allows users to **prompt questions** based on the video content.

The app uses **audio and image processing techniques** to extract text and generate a concise summary. Additionally, a **Retrieval-Augmented Generation (RAG) model** is implemented to answer user queries based on the extracted text.

---

## **Features**
- **Automatic Video Summarization**: Converts video content into structured text.
- **Multimodal Data Extraction**:
- **Audio-based transcript extraction**
- **Image-based text extraction (when audio is unavailable)**
- **Multi-Language Support**: Summarized text can be translated into various languages.
- **Question-Answering System**: Users can prompt questions and receive answers based on the video content.
- **Efficient Processing**: Generates summaries quickly, even in CPU runtime environments.

---

## **Methodology**
![Methodology](assets/Methodology.png)
### **1. Data Collection & Preprocessing**
- Collect videos from a single subject.
- Convert video into **audio and image frames**.
- Extract **transcripts from audio**.
- Extract **text from images** (when audio is muted).
- Clean and structure the extracted text.

### **2. Knowledge Base Construction**
- Combine text from **audio & images**.
- Create a **Retrieval-Augmented Generation (RAG) corpus** to facilitate question answering.

### **3. Summarization & Translation**
- Generate a **concise summary** of the extracted text.
- Provide translations into multiple languages.

### **4. User Interface & Deployment**
- Build an interface to **upload videos & prompt questions**.
- Deploy a **server** to run all models and functionalities.

---

## **Results**
- **TF-IDF Similarity Score**: **0.46**
- Ensures key terms are used without copying transcript structure.
- **Semantic Similarity Score**: **0.82**
- Ensures summaries retain meaning while using different sentence structures.
- **Processing Time**:
- For a **10-minute video**, summary generation takes around **4 minutes** on a **CPU-based system**.

### **Key Outcomes**
✅ Works with videos that **lack transcripts**.
✅ Combines **audio & image-based text extraction**.
✅ Supports **multilingual summaries**.
✅ Enables **question-answering based on video content**.

---

## **Installation and Usage**
### **Prerequisites**
- Python 3.x
- Required libraries (install via `requirements.txt`)

### **Installation**
```bash
git clone https://github.com/praths71018/Video_Text_Summarisation_And_Prompting.git
cd Video_Text_Summarisation_And_Prompting
```

### **Running the Application**
1. Go to the backend directory and create a virtual environment:
```bash
cd backend
python -m venv venv
```
2. Activate the virtual environment:
```bash
source venv/bin/activate
```
3. Install requirements:
```bash
pip install -r requirements.txt
```
4. **Start the backend server**:
```bash
python backend/app.py
```
5. Activate the **frontend**:
```bash
cd frontend
npm install
npm start
```
5. **Upload a video** through the web interface.
6. **Wait for processing** (transcription, summarization, translation).
7. **Ask questions** based on the generated summary.

---

## **Contributors**
- **Pratham R Shetty**
- **Prateek M**
- **R Ranjive**
- **Anirudh Krishna**

---

# Demo Video
[DemoVideo.mp4](assets/DemoVideo.mp4)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/praths71018/video_text_summarisation_and_prompting

Awesome Lists containing this project

README