{"id":25231482,"url":"https://github.com/notshrirang/loomrag","last_synced_at":"2026-02-23T08:03:08.077Z","repository":{"id":270304843,"uuid":"908865438","full_name":"NotShrirang/LoomRAG","owner":"NotShrirang","description":"🧠 Multimodal Retrieval-Augmented Generation that \"weaves\" together text and images seamlessly. 🪡","archived":false,"fork":false,"pushed_at":"2025-03-29T13:43:06.000Z","size":16348,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T15:29:33.348Z","etag":null,"topics":["clip","data-annotation","deep-learning","embeddings","faiss","faiss-cpu","fine-tuning","huggingface","langchain","machine-learning","multimodal","multimodal-rag","multimodal-retrieval-augmented-generation","openai","python","pytorch","retrieval-augmented-generation","transformer","transformers","whisper"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/NotShrirang/LoomRAG","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NotShrirang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-27T06:45:21.000Z","updated_at":"2025-03-29T13:43:09.000Z","dependencies_parsed_at":"2024-12-30T07:25:08.966Z","dependency_job_id":"c48b72f6-5553-45b6-965c-fb18d4d954dd","html_url":"https://github.com/NotShrirang/LoomRAG","commit_stats":null,"previous_names":["notshrirang/loomrag"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/NotShrirang/LoomRAG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FLoomRAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FLoomRAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FLoomRAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FLoomRAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NotShrirang","download_url":"https://codeload.github.com/NotShrirang/LoomRAG/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FLoomRAG/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29739760,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T07:44:07.782Z","status":"ssl_error","status_checked_at":"2026-02-23T07:44:07.432Z","response_time":90,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clip","data-annotation","deep-learning","embeddings","faiss","faiss-cpu","fine-tuning","huggingface","langchain","machine-learning","multimodal","multimodal-rag","multimodal-retrieval-augmented-generation","openai","python","pytorch","retrieval-augmented-generation","transformer","transformers","whisper"],"created_at":"2025-02-11T12:28:49.204Z","updated_at":"2026-02-23T08:03:08.060Z","avatar_url":"https://github.com/NotShrirang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🌟 LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search\n\n![GitHub stars](https://img.shields.io/github/stars/NotShrirang/LoomRAG?style=social)\n![GitHub forks](https://img.shields.io/github/forks/NotShrirang/LoomRAG?style=social)\n![GitHub commits](https://img.shields.io/github/commit-activity/t/NotShrirang/LoomRAG)\n![GitHub issues](https://img.shields.io/github/issues/NotShrirang/LoomRAG)\n![GitHub pull requests](https://img.shields.io/github/issues-pr/NotShrirang/LoomRAG)\n![GitHub](https://img.shields.io/github/license/NotShrirang/LoomRAG)\n![GitHub last commit](https://img.shields.io/github/last-commit/NotShrirang/LoomRAG)\n![GitHub repo size](https://img.shields.io/github/repo-size/NotShrirang/LoomRAG)\n\u003ca href=\"https://huggingface.co/spaces/NotShrirang/LoomRAG\"\u003e\u003cimg src=\"https://img.shields.io/badge/Streamlit%20App-red?style=flat-rounded-square\u0026logo=streamlit\u0026labelColor=white\"/\u003e\u003c/a\u003e\n\nThis project implements a Multimodal Retrieval-Augmented Generation (RAG) system, named **LoomRAG**, that leverages **OpenAI's CLIP** model for neural cross-modal image retrieval and semantic search, and **OpenAI's Whisper** model for audio processing. The system allows users to input text queries, images, or audio to retrieve multimodal responses seamlessly through vector embeddings. It features a comprehensive annotation interface for creating custom datasets and supports CLIP model fine-tuning with configurable parameters for domain-specific applications. The system also supports uploading images, PDFs, and audio files (including real-time recording) for enhanced interaction and intelligent retrieval capabilities through a Streamlit-based interface.\n\nExperience the project in action:\n\n[![LoomRAG Streamlit App](https://img.shields.io/badge/Streamlit%20App-red?style=for-the-badge\u0026logo=streamlit\u0026labelColor=white)](https://huggingface.co/spaces/NotShrirang/LoomRAG)\n\n---\n\n## 📸 Implementation Screenshots\n\n| ![Screenshot 2025-01-01 184852](https://github.com/user-attachments/assets/ad79d0f0-d200-4a82-8c2f-0890a9fe8189) | ![Screenshot 2025-01-01 222334](https://github.com/user-attachments/assets/7307857d-a41f-4f60-8808-00d6db6e8e3e) |\n| ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |\n| Data Upload Page                                                                                                 | Data Search / Retrieval                                                                                          |\n|                                                                                                                  |                                                                                                                  |\n| ![Screenshot 2025-01-01 222412](https://github.com/user-attachments/assets/e38273f4-426b-444d-80f0-501fa9563779) | ![Screenshot 2025-01-01 223948](https://github.com/user-attachments/assets/21724a92-ef79-44ae-83e6-25f8de29c45a) |\n| Data Annotation Page                                                                                             | CLIP Fine-Tuning                                                                                                 |\n\n---\n\n## ✨ Features\n\n- 🔄 **Cross-Modal Retrieval**: Search text to retrieve both text and image results using deep learning\n- 🖼️ **Image-Based Search**: Search the database by uploading an image to find similar content\n- 🧠 **Embedding-Based Search**: Uses OpenAI's CLIP, Whisper and SentenceTransformer's Embedding Models for embedding the input data\n- 🎯 **CLIP Fine-Tuning**: Supports custom model training with configurable parameters including test dataset split size, learning rate, optimizer, and weight decay\n- 🔨 **Fine-Tuned Model Integration**: Seamlessly load and utilize fine-tuned CLIP models for enhanced search and retrieval\n- 📤 **Upload Options**: Allows users to upload images, PDFs and audio files for AI-powered processing and retrieval\n- 🎙️ **Audio Integration**: Upload audio files or record audio directly through the interface\n- 🔗 **URL Integration**: Add images directly using URLs and scrape website data including text and images\n- 🕷️ **Web Scraping**: Automatically extract and index content from websites for comprehensive search capabilities\n- 🏷️ **Image Annotation**: Enables users to annotate uploaded images through an intuitive interface\n- 🔍 **Augmented Text Generation**: Enhances text results using LLMs for contextually rich outputs\n- 🌐 **Streamlit Interface**: Provides a user-friendly web interface for interacting with the system\n\n---\n\n## 🗺️ Roadmap\n\n- [x] Fine-tuning CLIP for domain-specific datasets\n- [x] Image-based search and retrieval\n- [x] Adding support for audeo modalities\n\n---\n\n## 🏗️ Architecture Overview\n\n![LoomRAG Architecture](https://github.com/user-attachments/assets/dc2a2b8d-801e-42dc-8b07-089a8f8b5641)\n*Architecture Diagram*\n\n1. **Data Indexing**:\n\n   - Text, images, and PDFs are preprocessed and embedded using the CLIP model\n   - Embeddings are stored in a vector database for fast and efficient retrieval\n   - Support for direct URL-based image indexing and website content scraping\n\n2. **Query Processing**:\n\n   - Text queries / image-based queries are converted into embeddings for semantic search\n   - Uploaded images, audio files and PDFs are processed and embedded for comparison\n   - The system performs a nearest neighbor search in the vector database to retrieve relevant text, images, and audio\n\n3. **Response Generation**:\n\n   - For text results: Optionally refined or augmented using a language model\n   - For image results: Directly returned or enhanced with image captions\n   - For audio results: Returned with relevant metadata and transcriptions where applicable\n   - For PDFs: Extracts text content and provides relevant sections\n\n4. **Image Annotation**:\n\n   - Dedicated annotation page for managing uploaded images\n   - Support for creating and managing multiple datasets simultaneously\n   - Flexible annotation workflow for efficient data labeling\n   - Dataset organization and management capabilities\n\n5. **Model Fine-Tuning**:\n   - Custom CLIP model training on annotated images\n   - Configurable training parameters for optimization\n   - Integration of fine-tuned models into the search pipeline\n\n---\n\n## 🚀 Installation\n\n1. Clone the repository:\n\n   ```bash\n   git clone https://github.com/NotShrirang/LoomRAG.git\n   cd LoomRAG\n   ```\n\n2. Create a virtual environment and install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n---\n\n## 📖 Usage\n\n1. **Running the Streamlit Interface**:\n\n   - Start the Streamlit app:\n\n     ```bash\n     streamlit run app.py\n     ```\n\n   - Access the interface in your browser to:\n     - Submit natural language queries\n     - Upload images or PDFs to retrieve contextually relevant results\n     - Upload or record audio files\n     - Add images using URLs\n     - Scrape and index website content\n     - Search using uploaded images\n     - Annotate uploaded images\n     - Fine-tune CLIP models with custom parameters\n     - Use fine-tuned models for improved search results\n\n2. **Example Queries**:\n   - **Text Query**: \"sunset over mountains\"  \n     Output: An image of a sunset over mountains along with descriptive text\n   - **PDF Upload**: Upload a PDF of a scientific paper  \n     Output: Extracted key sections or contextually relevant images\n\n---\n\n## ⚙️ Configuration\n\n- 📊 **Vector Database**: It uses FAISS for efficient similarity search\n- 🤖 **Model**: Uses OpenAI CLIP for neural embedding generation\n- ✍️ **Augmentation**: Optional LLM-based augmentation for text responses\n- 🎛️ Fine-Tuning: Configurable parameters for model training and optimization\n\n---\n\n## 🤝 Contributing\n\nContributions are welcome! Please open an issue or submit a pull request for any feature requests or bug fixes.\n\n---\n\n## 📄 License\n\nThis project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details.\n\n---\n\n## 🙏 Acknowledgments\n\n- [OpenAI CLIP](https://openai.com/research/clip)\n- [OpenAI Whisper](https://github.com/openai/whisper)\n- [FAISS](https://github.com/facebookresearch/faiss)\n- [Hugging Face](https://huggingface.co/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotshrirang%2Floomrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnotshrirang%2Floomrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotshrirang%2Floomrag/lists"}