https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai

agentic-ai agentic-rag agentic-workflow generative-ai generative-ai-model internvl2 multimodal multimodal-data multimodal-deep-learning multimodal-large-language-models multimodal-learning story-generation vision-language vision-language-learning vision-language-model vision-language-transformer

Last synced: 5 months ago
JSON representation

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Host: GitHub
URL: https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai
Owner: fork123aniket
License: mit
Created: 2025-01-29T02:35:50.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-01-29T21:13:26.000Z (5 months ago)
Last Synced: 2025-01-29T22:24:23.570Z (5 months ago)
Topics: agentic-ai, agentic-rag, agentic-workflow, generative-ai, generative-ai-model, internvl2, multimodal, multimodal-data, multimodal-deep-learning, multimodal-large-language-models, multimodal-learning, story-generation, vision-language, vision-language-learning, vision-language-model, vision-language-transformer
Language: Python
Homepage:
Size: 94.7 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 📖 **Agentic RAG Story Generation with Multimodal AI**

### **Transform Images into Engaging Narratives with Generative AI**

## 🚀 Overview
This project implements an **Agentic RAG (Retrieval-Augmented Generation) workflow** using **LangChain, LangGraph, and multimodal AI** to generate meaningful and engaging stories from user-uploaded images. The system seamlessly integrates **retrieval-based reasoning** with **generative AI capabilities**, leveraging **LLMs, vector search, and vision models** to produce immersive narratives.

---

## 🎯 **Why Agentic RAG?**
This project demonstrates how **LLMs, retrieval systems, and vision models** can collaborate to generate dynamic and **engaging AI-generated stories**. The **agentic workflow** makes it possible to balance **creativity and factuality**, ensuring a seamless story generation experience.

---

## ✨ Key Features
🔹 **Multimodal Input Handling**: Accepts **image uploads** and processes them for story generation.
🔹 **Retrieval-Augmented Generation (RAG)**: Enhances story coherence using **retrieved knowledge**.
🔹 **Agentic AI Workflow**: Dynamically selects **retriever** or **generator** based on available data.
🔹 **LLM-Driven Story Generation**: Generates **rich, context-aware** narratives using **InternVL2-40B**.
🔹 **Chroma Vector Search**: Stores and retrieves **image-text embeddings** for enhanced relevance.
🔹 **Cloud Integration**: Supports **Cloudinary** for image hosting and efficient handling.
🔹 **Streamlit UI**: Interactive interface for seamless user interaction.
🔹 **Speech-Enabled Narratives**: Turns AI-generated stories into spoken words for an interactive experience.

---

## 🛠 **Technology Stack**

This project leverages cutting-edge **LLMs, multimodal AI, and retrieval-based techniques** to ensure seamless and engaging story generation.

| **Category** | **Technology Used** |
|--------------------|--------------------|
| 🤖 **LLM** | InternVL2 - 40B |
| 🖼 **Vision Model** | CLIP (Contrastive Language-Image Pretraining) |
| 🔍 **Retriever** | ChromaDB Vector Search |
| 📜 **RAG Framework** | LangChain, LangGraph |
| 🌐 **Web Interface** | Streamlit |
| ☁️ **Cloud Storage** | Cloudinary API |
| 🎤 **Text-to-Speech** | pyttsx3 |

These technologies work together to **process images, retrieve relevant context, and generate rich, meaningful narratives**.

---

## 🎯 **Agentic RAG Gen AI Workflow**

This project follows an **Agentic RAG (Retrieval-Augmented Generation) framework**, meaning the system **autonomously** decides when to retrieve external knowledge and when to generate stories based on the input.

### **How It Works**
1️⃣ **Image Upload**: Users provide an image via the **Streamlit UI**.
2️⃣ **Embedding Generation**: The image is processed using **Hugging Face CLIP** embeddings.
3️⃣ **Retrieval Decision**:
- If relevant **stories exist**, the retriever fetches them for context.
- If no prior data is found, the system **generates a story from scratch**.

4️⃣ **LLM-Based Generation**: Uses **InternVL2-40B** to craft a compelling story.
5️⃣ **User Display**: The generated story is displayed in the UI.
6️⃣ **Audio Narration**: Converts generated stories into speech.

This approach ensures **optimal use of existing knowledge** while allowing for **creative story generation** when necessary.

---

## 🛠 **Workflow Architecture**

The diagram below illustrates the **modular AI workflow** used in this project.

![Visual Story Generation Workflow](https://github.com/fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI/blob/main/Images/Architecture.jpg)

### **Workflow Breakdown**
- **Start** → Initializes the workflow.
- **LLM Node** → Processes input data and decides next steps.
- **Tools Node** → Determines whether to retrieve past stories or generate new ones.
- **Retriever Node** → If prior knowledge exists, retrieves related image-text pairs.
- **Generator Node** → Uses the **LLM** to create an engaging story.
- **End** → Displays the final generated narrative to the user.

This **state-driven agentic AI approach** ensures that each decision maximizes the **relevance and creativity** of the generated content.

---

## 📌 **Installation & Usage**

### **1️⃣ Clone the Repository**
```bash
git clone https://github.com/fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI.git
cd Agentic-RAG-Story-Generation-with-Multimodal-GenAI
```

### **2️⃣ Install Dependencies**
```bash
pip install -r requirements.txt
```

### **3️⃣ Set Up Cloudinary for Image Hosting**
Replace the placeholders in `Agentic_Workflow.py`:
```python
cloudinary.config(
cloud_name='your_cloud_name',
api_key='your_api_key',
api_secret='your_secret_key'
)
```

### **4️⃣ Run the Streamlit App**
```bash
streamlit run Agentic_Workflow.py
```

### **5️⃣ Upload an Image & Enjoy AI-Powered Story Generation!**

---

## 📂 **Project Structure**

The project is structured as follows:

```
📦 Agentic_RAG_Story_Generation
├── 📜 Agentic_Workflow.py # Core agentic workflow implementation
├── 📜 requirements.txt # Dependencies list
├── 📜 README.md # Project documentation

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai

Awesome Lists containing this project

README