https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai
agentic-ai agentic-rag agentic-workflow generative-ai generative-ai-model internvl2 multimodal multimodal-data multimodal-deep-learning multimodal-large-language-models multimodal-learning story-generation vision-language vision-language-learning vision-language-model vision-language-transformer
Last synced: 5 months ago
JSON representation
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
- Host: GitHub
- URL: https://github.com/fork123aniket/agentic-rag-story-generation-with-multimodal-genai
- Owner: fork123aniket
- License: mit
- Created: 2025-01-29T02:35:50.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-29T21:13:26.000Z (5 months ago)
- Last Synced: 2025-01-29T22:24:23.570Z (5 months ago)
- Topics: agentic-ai, agentic-rag, agentic-workflow, generative-ai, generative-ai-model, internvl2, multimodal, multimodal-data, multimodal-deep-learning, multimodal-large-language-models, multimodal-learning, story-generation, vision-language, vision-language-learning, vision-language-model, vision-language-transformer
- Language: Python
- Homepage:
- Size: 94.7 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 📖 **Agentic RAG Story Generation with Multimodal AI**
### **Transform Images into Engaging Narratives with Generative AI**
## 🚀 Overview
This project implements an **Agentic RAG (Retrieval-Augmented Generation) workflow** using **LangChain, LangGraph, and multimodal AI** to generate meaningful and engaging stories from user-uploaded images. The system seamlessly integrates **retrieval-based reasoning** with **generative AI capabilities**, leveraging **LLMs, vector search, and vision models** to produce immersive narratives.---
## 🎯 **Why Agentic RAG?**
This project demonstrates how **LLMs, retrieval systems, and vision models** can collaborate to generate dynamic and **engaging AI-generated stories**. The **agentic workflow** makes it possible to balance **creativity and factuality**, ensuring a seamless story generation experience.---
## ✨ Key Features
🔹 **Multimodal Input Handling**: Accepts **image uploads** and processes them for story generation.
🔹 **Retrieval-Augmented Generation (RAG)**: Enhances story coherence using **retrieved knowledge**.
🔹 **Agentic AI Workflow**: Dynamically selects **retriever** or **generator** based on available data.
🔹 **LLM-Driven Story Generation**: Generates **rich, context-aware** narratives using **InternVL2-40B**.
🔹 **Chroma Vector Search**: Stores and retrieves **image-text embeddings** for enhanced relevance.
🔹 **Cloud Integration**: Supports **Cloudinary** for image hosting and efficient handling.
🔹 **Streamlit UI**: Interactive interface for seamless user interaction.
🔹 **Speech-Enabled Narratives**: Turns AI-generated stories into spoken words for an interactive experience.---
## 🛠 **Technology Stack**
This project leverages cutting-edge **LLMs, multimodal AI, and retrieval-based techniques** to ensure seamless and engaging story generation.
| **Category** | **Technology Used** |
|--------------------|--------------------|
| 🤖 **LLM** | InternVL2 - 40B |
| 🖼 **Vision Model** | CLIP (Contrastive Language-Image Pretraining) |
| 🔍 **Retriever** | ChromaDB Vector Search |
| 📜 **RAG Framework** | LangChain, LangGraph |
| 🌐 **Web Interface** | Streamlit |
| ☁️ **Cloud Storage** | Cloudinary API |
| 🎤 **Text-to-Speech** | pyttsx3 |These technologies work together to **process images, retrieve relevant context, and generate rich, meaningful narratives**.
---
## 🎯 **Agentic RAG Gen AI Workflow**
This project follows an **Agentic RAG (Retrieval-Augmented Generation) framework**, meaning the system **autonomously** decides when to retrieve external knowledge and when to generate stories based on the input.
### **How It Works**
1️⃣ **Image Upload**: Users provide an image via the **Streamlit UI**.
2️⃣ **Embedding Generation**: The image is processed using **Hugging Face CLIP** embeddings.
3️⃣ **Retrieval Decision**:
- If relevant **stories exist**, the retriever fetches them for context.
- If no prior data is found, the system **generates a story from scratch**.
4️⃣ **LLM-Based Generation**: Uses **InternVL2-40B** to craft a compelling story.
5️⃣ **User Display**: The generated story is displayed in the UI.
6️⃣ **Audio Narration**: Converts generated stories into speech.This approach ensures **optimal use of existing knowledge** while allowing for **creative story generation** when necessary.
---
## 🛠 **Workflow Architecture**
The diagram below illustrates the **modular AI workflow** used in this project.

### **Workflow Breakdown**
- **Start** → Initializes the workflow.
- **LLM Node** → Processes input data and decides next steps.
- **Tools Node** → Determines whether to retrieve past stories or generate new ones.
- **Retriever Node** → If prior knowledge exists, retrieves related image-text pairs.
- **Generator Node** → Uses the **LLM** to create an engaging story.
- **End** → Displays the final generated narrative to the user.This **state-driven agentic AI approach** ensures that each decision maximizes the **relevance and creativity** of the generated content.
---
## 📌 **Installation & Usage**
### **1️⃣ Clone the Repository**
```bash
git clone https://github.com/fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI.git
cd Agentic-RAG-Story-Generation-with-Multimodal-GenAI
```### **2️⃣ Install Dependencies**
```bash
pip install -r requirements.txt
```### **3️⃣ Set Up Cloudinary for Image Hosting**
Replace the placeholders in `Agentic_Workflow.py`:
```python
cloudinary.config(
cloud_name='your_cloud_name',
api_key='your_api_key',
api_secret='your_secret_key'
)
```### **4️⃣ Run the Streamlit App**
```bash
streamlit run Agentic_Workflow.py
```### **5️⃣ Upload an Image & Enjoy AI-Powered Story Generation!**
---
## 📂 **Project Structure**
The project is structured as follows:
```
📦 Agentic_RAG_Story_Generation
├── 📜 Agentic_Workflow.py # Core agentic workflow implementation
├── 📜 requirements.txt # Dependencies list
├── 📜 README.md # Project documentation