{"id":45199398,"url":"https://github.com/isatyamks/multimodal-rag","last_synced_at":"2026-02-20T13:31:13.545Z","repository":{"id":334915543,"uuid":"1121634982","full_name":"isatyamks/multimodal-rag","owner":"isatyamks","description":"Multimodal RAG system for generating test cases and use cases from documents using hybrid retrieval, safety guards, and LLMs.","archived":false,"fork":false,"pushed_at":"2026-01-27T14:33:50.000Z","size":2861,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-01-28T00:59:38.508Z","etag":null,"topics":["ai-testing","chromadb","hallucination-mitigation","hybrid-search","hybrid-search-technique","llm","ml","multimodal","multimodal-rag","nlp","prompt-safety","python","rag","rags","retrival-augmented-generation","test-automation","testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/isatyamks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-23T09:54:28.000Z","updated_at":"2026-01-27T14:38:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/isatyamks/multimodal-rag","commit_stats":null,"previous_names":["isatyamks/multimodal-rag"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/isatyamks/multimodal-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isatyamks%2Fmultimodal-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isatyamks%2Fmultimodal-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isatyamks%2Fmultimodal-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isatyamks%2Fmultimodal-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/isatyamks","download_url":"https://codeload.github.com/isatyamks/multimodal-rag/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/isatyamks%2Fmultimodal-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29652580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T09:27:29.698Z","status":"ssl_error","status_checked_at":"2026-02-20T09:26:12.373Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-testing","chromadb","hallucination-mitigation","hybrid-search","hybrid-search-technique","llm","ml","multimodal","multimodal-rag","nlp","prompt-safety","python","rag","rags","retrival-augmented-generation","test-automation","testing"],"created_at":"2026-02-20T13:31:13.427Z","updated_at":"2026-02-20T13:31:13.538Z","avatar_url":"https://github.com/isatyamks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multimodal RAG System for Test Case Generation\n\nAI-powered test case and use case generation using Retrieval-Augmented Generation with safety guardrails.\n\n---\n\n## Overview\n\nThis is a production-ready RAG system that generates test cases and use cases from documentation. It processes multimodal documents (text, PDF, DOCX, images), retrieves relevant context using hybrid search, and generates structured outputs with three-layer validation guards.\n\n### Key Features\n\n- **Context-Grounded Generation**: All outputs verified against source documents\n- **Three-Layer Guards**: Prompt injection detection, evidence threshold checking, hallucination prevention\n- **Multimodal Support**: Text, Markdown, PDF, DOCX, and images via OCR\n- **Hybrid Search**: Combines vector similarity (semantic) and BM25 (keyword matching)\n- **RESTful API**: Auto-generated documentation with OpenAPI/Swagger\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.10+\n- Groq API Key (free at [console.groq.com](https://console.groq.com))\n\n### Installation\n\n```bash\ngit clone \u003cyour-repo-url\u003e\ncd multimodal-rag\n\npython -m venv venv\nvenv\\Scripts\\activate  # Windows: venv\\Scripts\\activate\n\npip install -r requirements.txt\n\ncopy sample.env .env\n# Edit .env and add your GROQ_API_KEY\n\npython main.py\n```\n\nServer starts at `http://localhost:8000`. View API docs at `http://localhost:8000/docs`.\n\n---\n\n## Architecture\n\n### System Overview\n\n```\n┌─────────────────────────────────────────────────────┐\n│                  CLIENT REQUEST                      │\n│            (HTTP POST with query/files)              │\n└────────────────────┬────────────────────────────────┘\n                     ▼\n┌─────────────────────────────────────────────────────┐\n│              FASTAPI APPLICATION LAYER               │\n│   Routes: /upload, /generate/*, /query, /stats      │\n└────────────────────┬────────────────────────────────┘\n                     │\n        ┌────────────┼────────────┬─────────────┐\n        ▼            ▼            ▼             ▼\n┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐\n│ INGESTION   │ │RETRIEVAL │ │GENERATION│ │ GUARDS  │\n│  PIPELINE   │ │  SYSTEM  │ │  ENGINE  │ │ SYSTEM  │\n└─────────────┘ └──────────┘ └──────────┘ └─────────┘\n```\n\n### Component Details\n\n#### 1. Ingestion Pipeline\n\n**Purpose**: Transform raw documents into searchable vector embeddings.\n\n**Process Flow**:\n```\nDocument → Parser → Text Extraction → Chunker → Embeddings → Vector Store\n```\n\n**Components**:\n- **File Handler**: Routes files to appropriate parser based on extension\n- **Parsers**: \n  - `TextParser`: .txt, .md files (native Python)\n  - `PDFParser`: .pdf files (PyPDF2)\n  - `DocxParser`: .docx files (python-docx)\n  - `ImageParser`: .png, .jpg files (Tesseract OCR)\n- **Chunker**: Splits text into 512-token chunks with 50-token overlap\n- **Embedding Model**: all-MiniLM-L6-v2 (384-dimensional vectors)\n- **Vector Store**: ChromaDB with HNSW index\n\n**Key Decisions**:\n- Chunk size 512: Balance between context and embedding model limits\n- Overlap 50: Prevents context loss at boundaries\n- Local embeddings: No API calls, faster processing\n\n#### 2. Retrieval System\n\n**Purpose**: Find most relevant document chunks for a given query.\n\n**Hybrid Search Algorithm**:\n```\nQuery → [Vector Search] + [BM25 Search] → Score Fusion → Top-K Results\n```\n\n**Vector Search (Semantic)**:\n- Embeds query using same model as documents\n- Performs cosine similarity search in ChromaDB\n- Captures semantic meaning (e.g., \"login\" matches \"authentication\")\n\n**BM25 Search (Keyword)**:\n- Tokenizes query and documents\n- Uses Okapi BM25 ranking function\n- Captures exact keyword matches (e.g., \"API_KEY_123\")\n\n**Score Fusion**:\n```python\n# Normalize both scores to [0, 1] range\nvector_norm = vector_score / max_vector_score\nbm25_norm = bm25_score / max_bm25_score\n\n# Combine with alpha weight (default: 0.3)\nhybrid_score = (alpha * vector_norm) + ((1 - alpha) * bm25_norm)\n```\n\n**Why Hybrid**:\n- Pure vector misses exact keywords\n- Pure BM25 misses semantic similarity\n- Hybrid provides best of both approaches\n\n#### 3. Generation Engine\n\n**Purpose**: Generate structured test cases/use cases using LLM.\n\n**Components**:\n- **LLM Client**: Abstraction over Groq API (llama-3.3-70b-versatile)\n- **Prompt Templates**: Centralized in `src/config/prompts.py`\n- **Use Case Generator**: Generates preconditions, steps, expected results\n- **Test Case Generator**: Generates test ID, priority, type, steps\n- **Output Formatter**: Parses and validates JSON responses\n\n**Generation Flow**:\n```\n1. Retrieve top-k relevant chunks\n2. Construct prompt with context + query\n3. Call LLM with temperature=0.2 (deterministic)\n4. Parse JSON response\n5. Validate structure and fields\n```\n\n**LLM Configuration**:\n- Model: llama-3.3-70b-versatile (Groq)\n- Temperature: 0.2 (low for consistency)\n- Max tokens: 2000\n- Response format: JSON\n\n#### 4. Guards System\n\n**Purpose**: Multi-layer validation to ensure safety and quality.\n\n**Guard 1: Prompt Injection Guard**\n- **When**: Before retrieval\n- **Purpose**: Detect malicious queries attempting to manipulate system\n- **Method**: 20+ regex patterns matching known attack vectors\n- **Action**: Block high-risk queries, return 400 error\n\n**Guard 2: Evidence Threshold Guard**\n- **When**: After retrieval, before LLM call\n- **Purpose**: Ensure sufficient context quality\n- **Method**: Weighted confidence score from retrieval results\n- **Formula**:\n  ```python\n  confidence = sum(score * weight for score, weight in zip(scores, [1.0, 0.8, 0.6, 0.4, 0.2]))\n  sufficient = confidence \u003e= MIN_EVIDENCE_CONFIDENCE\n  ```\n- **Action**: Block generation if confidence too low, provide recommendation\n\n**Guard 3: Hallucination Guard**\n- **When**: After generation\n- **Purpose**: Verify output is grounded in source documents\n- **Method**: \n  - Extract statements from generated output\n  - Calculate semantic similarity with context chunks\n  - Compute grounding ratio\n- **Action**: Flag warning if \u003c70% of statements are grounded\n\n**Guard Orchestration**:\n```\nQuery → Guard 1 ✓ → Retrieval → Guard 2 ✓ → LLM → Guard 3 ✓ → Response\n        (block)              (block)              (warn)\n```\n\n### Data Flow: Complete Request Lifecycle\n\n#### Document Upload Flow\n\n```\n1. POST /upload with multipart file\n2. Validate file type and size\n3. Save to data/uploads/\n4. Detect file type, route to parser\n5. Parse: Extract text + metadata\n6. Chunk: Split into 512-token pieces\n7. Embed: Generate 384-dim vectors (batched)\n8. Store: Add to ChromaDB collection\n9. Index: Build BM25 inverted index\n10. Return: chunks_indexed, embedding_dim\n```\n\n#### Generation Request Flow\n\n```\n1. POST /generate/test-cases with query\n2. Guard 1: Check for prompt injection\n   → If unsafe: Return 400 error\n3. Embed query using same model\n4. Hybrid Search:\n   a. Vector search: ChromaDB similarity query\n   b. BM25 search: Okapi BM25 ranking\n   c. Normalize scores to [0, 1]\n   d. Fuse: hybrid = 0.3*vector + 0.7*bm25\n   e. Sort by hybrid score, select top-k\n5. Guard 2: Calculate confidence\n   → If insufficient: Return error with recommendation\n6. Construct LLM prompt:\n   - System: \"You are a QA engineer...\"\n   - User: \"Context: [chunks]\\nQuery: {query}\\nOutput: JSON\"\n7. Call Groq API with llama-3.3-70b\n8. Parse JSON response\n9. Guard 3: Verify grounding\n   → If not grounded: Log warning (still return)\n10. Format response with validation metadata\n11. Return JSON to client\n```\n\n### Technical Specifications\n\n**Embedding Model**:\n- Name: sentence-transformers/all-MiniLM-L6-v2\n- Dimensions: 384\n- Max sequence length: 256 tokens\n- Performance: ~200ms for 32 queries (batch)\n\n**Vector Database**:\n- Type: ChromaDB (embedded)\n- Index: HNSW (Hierarchical Navigable Small World)\n- Distance metric: Cosine similarity\n- Persistence: SQLite backend\n\n**Search Parameters**:\n- `TOP_K`: Number of chunks to retrieve (default: 10)\n- `SIMILARITY_THRESHOLD`: Minimum cosine similarity (default: 0.5)\n- `HYBRID_ALPHA`: Vector vs BM25 weight (default: 0.3)\n- `MIN_EVIDENCE_CONFIDENCE`: Guard threshold (default: 0.5)\n\n**LLM Settings**:\n- Provider: Groq\n- Model: llama-3.3-70b-versatile\n- Speed: ~300 tokens/second\n- Temperature: 0.2 (deterministic outputs)\n- Max tokens: 2000\n- Response format: JSON mode\n\n**Performance Characteristics**:\n- Document parsing: 200-500ms (depends on file size)\n- Embedding generation: 100-200ms per batch of 32\n- Vector search: 20-50ms\n- BM25 search: 10-30ms\n- LLM generation: 1-3s (network + processing)\n- Total end-to-end: 2-4s per request\n\n---\n\n## Configuration\n\nEdit `.env` file:\n\n```bash\n# LLM Configuration\nGROQ_API_KEY=your_api_key_here\nGROQ_MODEL=llama-3.3-70b-versatile\nLLM_TEMPERATURE=0.2\nMAX_TOKENS=2000\n\n# Retrieval\nTOP_K=10\nSIMILARITY_THRESHOLD=0.5\nHYBRID_ALPHA=0.3\n\n# Guards\nMIN_EVIDENCE_CONFIDENCE=0.5\nENABLE_HALLUCINATION_GUARD=true\nENABLE_PROMPT_INJECTION_GUARD=true\nENABLE_EVIDENCE_THRESHOLD=true\n\n# Chunking\nCHUNK_SIZE=512\nCHUNK_OVERLAP=50\n\n# API\nAPI_HOST=0.0.0.0\nAPI_PORT=8000\n```\n\nSee `sample.env` for all options.\n\n---\n\n## API Reference\n\n### Base URL\n```\nhttp://localhost:8000\n```\n\n### Endpoints\n\n#### Health Check\n```http\nGET /health\n```\n\nReturns system health and statistics.\n\n#### Upload Documents\n```http\nPOST /upload\nContent-Type: multipart/form-data\n```\n\nUpload files (.txt, .md, .pdf, .docx, .png, .jpg).\n\n**Example**:\n```bash\ncurl -X POST http://localhost:8000/upload -F \"files=@document.pdf\"\n```\n\n#### Generate Use Case\n```http\nPOST /generate/use-case\n```\n\n**Parameters**:\n- `query` (required): Description of desired use case\n- `top_k` (optional): Number of context chunks (default: 5)\n- `search_mode` (optional): \"hybrid\" | \"vector\" | \"keyword\"\n\n**Example**:\n```bash\ncurl -X POST http://localhost:8000/generate/use-case \\\n  -F \"query=Create use cases for user login\" \\\n  -F \"top_k=5\"\n```\n\n**Response**:\n```json\n{\n  \"success\": true,\n  \"use_case\": {\n    \"title\": \"User Login with Email and Password\",\n    \"description\": \"...\",\n    \"preconditions\": [...],\n    \"steps\": [...],\n    \"expected_result\": \"...\",\n    \"negative_cases\": [...],\n    \"boundary_cases\": [...]\n  },\n  \"validation\": {\n    \"query_safe\": true,\n    \"evidence_sufficient\": true,\n    \"output_grounded\": true\n  }\n}\n```\n\n#### Generate Test Cases\n```http\nPOST /generate/test-cases\n```\n\nSame parameters as use case endpoint.\n\n**Response**:\n```json\n{\n  \"success\": true,\n  \"test_cases\": [\n    {\n      \"test_id\": \"TC001\",\n      \"title\": \"...\",\n      \"priority\": \"P0\",\n      \"type\": \"functional\",\n      \"steps\": [...],\n      \"expected_result\": \"...\"\n    }\n  ]\n}\n```\n\n#### System Statistics\n```http\nGET /stats\n```\n\n#### Reset Index\n```http\nDELETE /index\n```\n\nFor complete API documentation, visit `http://localhost:8000/docs` when server is running.\n\n---\n\n## Project Structure\n\n```\nmultimodal-rag/\n├── src/\n│   ├── config/          # Configuration and prompts\n│   ├── utils/           # Logging, metrics, utilities\n│   ├── ingestion/       # Document parsing and chunking\n│   ├── retrieval/       # Vector store, hybrid search\n│   ├── guards/          # Safety validation layers\n│   └── generation/      # LLM client and generators\n├── data/\n│   ├── uploads/         # Uploaded files\n│   └── vector_store/    # ChromaDB persistence\n├── logs/                # Application logs\n├── main.py              # FastAPI application\n├── requirements.txt     # Dependencies\n├── .env                 # Environment variables\n└── sample.env           # Configuration template\n```\n\n---\n\n## Guards System\n\n### 1. Prompt Injection Guard\n**When**: Before retrieval  \n**Purpose**: Detect malicious inputs attempting to manipulate system prompts\n\nBlocks queries like:\n- \"Ignore previous instructions and...\"\n- \"System: You are now a...\"\n- \"Print your system prompt\"\n\n### 2. Evidence Threshold Guard\n**When**: After retrieval, before LLM call  \n**Purpose**: Ensure sufficient context confidence before generation\n\nPrevents LLM calls when:\n- Retrieval confidence \u003c threshold (default: 0.5)\n- Retrieved chunks not relevant to query\n- Insufficient documents indexed\n\n### 3. Hallucination Guard\n**When**: After generation  \n**Purpose**: Verify output is grounded in retrieved context\n\nValidates that:\n- Generated statements are supported by source documents\n- Grounding ratio \u003e= 70%\n- No invented features or capabilities\n\n---\n\n## Technologies\n\n**Core**:\n- FastAPI - Web framework\n- ChromaDB - Vector database\n- Sentence Transformers - Embeddings (all-MiniLM-L6-v2)\n- Rank BM25 - Keyword search\n\n**LLM**:\n- Groq (llama-3.3-70b-versatile) - Primary provider\n- Ollama - Alternative local option\n\n**Document Processing**:\n- PyPDF2 - PDF parsing\n- python-docx - DOCX parsing\n- Tesseract/EasyOCR - OCR for images\n\n---\n\n## Usage Examples\n\n### Python API\n\n```python\nfrom src.generation import Generator\n\ngenerator = Generator()\n\nresult = generator.generate_use_case(\n    query=\"Create use cases for user login\",\n    top_k=5,\n    search_mode=\"hybrid\"\n)\n\nif result['success']:\n    print(result['use_case']['title'])\n```\n\n### Command Line\n\n```bash\n# Start server\npython main.py\n\n# Test with sample data\ncurl -X POST http://localhost:8000/upload \\\n  -F \"files=@data/Booking-com/Booking.com Hotel Search.docx\"\n\ncurl -X POST http://localhost:8000/generate/test-cases \\\n  -F \"query=Generate test cases for hotel search filters\"\n```\n\n---\n\n## Development\n\n### Running Tests\n\n```bash\npytest tests/ -v\n```\n\n### Code Quality\n\n```bash\nblack src/\nflake8 src/\nmypy src/\n```\n\n---\n\n## Deployment\n\n### Systemd Service\n\n```bash\nsudo nano /etc/systemd/system/devasssure.service\n```\n\n```ini\n[Unit]\nDescription=DevAssure RAG API\nAfter=network.target\n\n[Service]\nType=simple\nUser=www-data\nWorkingDirectory=/opt/multimodal-rag\nEnvironment=\"PATH=/opt/multimodal-rag/venv/bin\"\nExecStart=/opt/multimodal-rag/venv/bin/python /opt/multimodal-rag/main.py\nRestart=always\n\n[Install]\nWantedBy=multi-user.target\n```\n\n```bash\nsudo systemctl enable devasssure\nsudo systemctl start devasssure\n```\n\n### Docker\n\n```dockerfile\nFROM python:3.10-slim\nWORKDIR /app\nRUN apt-get update \u0026\u0026 apt-get install -y tesseract-ocr\nCOPY requirements.txt .\nRUN pip install -r requirements.txt\nCOPY . .\nEXPOSE 8000\nCMD [\"python\", \"main.py\"]\n```\n\n---\n\n## Troubleshooting\n\n### Import Errors\n```bash\npip install -r requirements.txt\n```\n\n### ChromaDB Lock\n```bash\npkill -f \"python main.py\"\nrm data/vector_store/chroma.sqlite3-wal\npython main.py\n```\n\n### Tesseract Not Found\n```bash\n# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki\n# Linux:\nsudo apt install tesseract-ocr\n\n# Update .env:\nTESSERACT_PATH=/usr/bin/tesseract\n```\n\n---\n\n## Performance\n\nTypical metrics (16GB RAM, 8-core CPU):\n\n| Operation | Time |\n|-----------|------|\n| Upload PDF (10 pages) | 2.5s |\n| Chunk + Embed (100 chunks) | 3.8s |\n| Hybrid Search | 45ms |\n| LLM Generation | 2.1s |\n| End-to-End Query | 2.2s |\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fisatyamks%2Fmultimodal-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fisatyamks%2Fmultimodal-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fisatyamks%2Fmultimodal-rag/lists"}