https://github.com/htanh2003/llm_powered_video_search

The LLM-Powered Video Search System is an advanced multimodal video search solution that leverages Large Language Models (LLMs) to enhance video retrieval through text, image, and metadata queries.
https://github.com/htanh2003/llm_powered_video_search

clip django docker faiss multimodal retrieval retrieval-augmented-generation text-image-retrieval tf-idf yolo

Last synced: about 2 months ago
JSON representation

The LLM-Powered Video Search System is an advanced multimodal video search solution that leverages Large Language Models (LLMs) to enhance video retrieval through text, image, and metadata queries.

Host: GitHub
URL: https://github.com/htanh2003/llm_powered_video_search
Owner: HTAnh2003
Created: 2024-11-13T06:44:25.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-01-07T08:24:47.000Z (9 months ago)
Last Synced: 2025-05-07T20:29:09.014Z (5 months ago)
Topics: clip, django, docker, faiss, multimodal, retrieval, retrieval-augmented-generation, text-image-retrieval, tf-idf, yolo
Language: Jupyter Notebook
Homepage: https://faster-united.info/
Size: 5.29 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          


  



 🧠 LLM-Powered Video Search System for AIC2024 




  An intelligent video retrieval system leveraging Large Language Models (LLMs) and multimodal search, developed for the AIC2024 competition and accepted at the international SOICT 2024 conference.



![Static Badge](https://img.shields.io/badge/python->=3.10-blue)

![Static Badge](https://img.shields.io/badge/django-3.x-blue)

![Static Badge](https://img.shields.io/badge/clip-v1.0-blue)

![Static Badge](https://img.shields.io/badge/tfidf-1.5.2-blue)

  Table of Contents

  - [📍 Overview](#-overview)

  - [🎯 Features](#-features)

  - [🤖 Tech Stack](#-tech-stack)

  - [🚀 Setup and Usage](#-setup-and-usage)

  - [🎬 Demo](#-demo)

  - [👣 Workflow](#-workflow)

  - [📐 App Structure](#-app-structure)

  - [🧑‍💻 Contributors](#-contributors)

  - [📚 Citation](#-citation)

## 📍 Overview 

The `LLM-Powered Video Search System` is an advanced multimodal video search solution that leverages Large Language Models (LLMs) to enhance video retrieval through text, image, and metadata queries. This project was developed for the [AIC2024](https://aichallenge.hochiminhcity.gov.vn/) competition and has been accepted at the international [SOICT 2024](https://soict.org/) conference, aiming to provide an intelligent and efficient video search system. Details about the paper can be found on [Springer](https://link.springer.com/chapter/10.1007/978-981-96-4291-5_25).

## 🎯 Features

1. **Multimodal Search Capabilities**

   - **Text-based search:** Supports ASR (Automatic Speech Recognition), OCR, captions, and descriptive image queries for improved accuracy.

   - **Image-based search:** Enables users to find specific video segments based on images.

   - **Metadata-based search:** Provides a 7x7 matrix for tagging objects and color attributes for contextual search.

2. **LLM-Powered Interaction**

   - Integrates LLMs (e.g., GPT-4) to handle natural language queries and deliver relevant search results tailored to the context.

3. **User-Friendly Interface**

   - A responsive user interface allows users to view results as keyframes or full video segments and interact with detailed metadata.

## 🤖 Tech Stack

- **Back-end**: Django

- **Core Technologies**: CLIP, Faiss, TFIDF

- **Supporting Technologies**: OpenCV, PyTorch, Transformers

- **Development Tools**: Docker, Git, Jupyter Notebook

## 🚀 Setup and Usage

1. **Clone Repository**

   ```bash

   git clone https://github.com/HTAnh2003/LLM_Powered_Video_Search.git

   cd LLM_Powered_Video_Search

   ```

2. **Install Dependencies**

   Ensure Python and Django are installed. Then, install other dependencies from `requirements.txt`:

   ```bash

   pip install -r requirements.txt

   ```

3. **Configure `MEDIA_ROOT`**

   Open [settings.py](./AIC/settings.py) in the `AIC/` folder and set `MEDIA_ROOT` to point to your local `media` directory:

   ```python

   MEDIA_ROOT = '/path/to/your/media'

   ```

   You can download the dataset from [Google Drive](https://drive.google.com/drive/folders/17Yab4iMAEzok0pO_czgbAkKBlaQ2ptqU) or [Kaggle](https://www.kaggle.com/datasets/tienanh2003/keyframes-v1-aic2024).

4. **Verify Paths in `viewAPI.py`**

   Ensure paths in [app/viewAPI.py](./app/viewAPI.py) are correct.

5. **Run Migrations**

   Update the database with migrations:

   ```bash

   python manage.py migrate

   ```

6. **Run the Application**

   To start the application, use:

   ```bash

   python manage.py runserver

   ```

   The app will run by default at `http://127.0.0.1:8000/`.

## 🎬 Demo

- **Screenshots**: ![image](./figs/image/demo.png)

## 👣 Workflow

![Pipeline](./figs/image/Pipeline.png)

- **Data Processing**: Video data is processed using ASR or extracted via TransnetV2, then converted into image features and metadata.

![Data Processing](./figs/image/data_processing.png)

- **LLM Powered Interaction**: Natural language queries are processed by the LLM and combined with image features and metadata for relevant video retrieval.

![LLM Interaction](./figs/image/LLM.png)

## 📐 App Structure

```

├── LLM_Powered_Video_Search/

│   ├── AIC/

│   │   ├── settings.py

│   ├── app/

│   │   ├── admin.py

│   │   ├── data_utils.py

│   │   ├── migrations/

│   │   ├── static/

│   │   ├── templates/

│   │   ├── viewAPI.py 

│   ├── data_extraction/

│   │   ├── TransnetV2/

│   │   ├── audio/

│   │   ├── metadata/

│   ├── docker-compose.yml

│   ├── figs/

│   ├── manage.py

│   ├── requirements.txt

│   ├── utils/

│       ├── LLM/

│       ├── video_retrieval/

│       ├── faiss_search.py

│       ├── combine_search.py

|       |...

```

## 🧑‍💻 Contributors

- [Hoàng Tiến Anh](https://github.com/HTAnh2003)

- [Trần Xuân Diện](https://github.com/xndien2004)

- [Dương Văn Tài](https://github.com/TaiDuongRepo)

## 📚 Citation

If you use this system in your research or publications, please cite it using the following format:

```bibtex

@InProceedings{10.1007/978-981-96-4291-5_25,

  author    = {Tran, Dien X. and Hoang, Anh T. and Duong, Tai V. and Nguyen, Kien C.},

  editor    = {Buntine, Wray and Fjeld, Morten and Tran, Truyen and Tran, Minh-Triet and Huynh Thi Thanh, Binh and Miyoshi, Takumi},

  title     = {LLM-Powered Video Search: A Comprehensive Multimedia Retrieval System},

  booktitle = {Information and Communication Technology},

  year      = {2025},

  publisher = {Springer Nature Singapore},

  address   = {Singapore},

  pages     = {305--315},

  isbn      = {978-981-96-4291-5}

}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/htanh2003/llm_powered_video_search

Awesome Lists containing this project

README

🧠 LLM-Powered Video Search System for AIC2024