{"id":26360175,"url":"https://github.com/preethi2805/naive_rag_chat","last_synced_at":"2026-01-03T01:53:03.227Z","repository":{"id":281440066,"uuid":"945282206","full_name":"Preethi2805/naive_rag_chat","owner":"Preethi2805","description":"This project demonstrates how to build a Document Querying System using ChromaDB to store and retrieve document embeddings, along with integration with Groq's LLaMA models for question-answering tasks. ","archived":false,"fork":false,"pushed_at":"2025-03-09T04:09:08.000Z","size":2515,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-09T05:18:15.656Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Preethi2805.png","metadata":{"files":{"readme":"README.md","changelog":"news_articles/05-03-ai-powered-supply-chain-startup-pando-lands-30m-investment.txt","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-09T03:59:30.000Z","updated_at":"2025-03-09T04:09:11.000Z","dependencies_parsed_at":"2025-03-09T05:28:21.447Z","dependency_job_id":null,"html_url":"https://github.com/Preethi2805/naive_rag_chat","commit_stats":null,"previous_names":["preethi2805/naive_rag_chat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Preethi2805%2Fnaive_rag_chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Preethi2805%2Fnaive_rag_chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Preethi2805%2Fnaive_rag_chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Preethi2805%2Fnaive_rag_chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Preethi2805","download_url":"https://codeload.github.com/Preethi2805/naive_rag_chat/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243900174,"owners_count":20366118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-16T16:39:03.974Z","updated_at":"2026-01-03T01:53:03.202Z","avatar_url":"https://github.com/Preethi2805.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Document-Based Question-Answering System Using Naive RAG**\n\n## **Overview**\nThis project implements a **Document-Based Question-Answering System** using a **Naive Retrieval-Augmented Generation (RAG)** approach. The system retrieves relevant document chunks from a local database and uses **Groq's LLaMA model** to generate context-aware answers to user queries. \n\nThe system works with text data (news articles in this case) and performs the following tasks:\n1. Loads news articles stored as `.txt` files from a specified directory.\n2. Splits the content of the articles into smaller chunks to allow for efficient retrieval.\n3. Generates embeddings for each chunk using a **SentenceTransformer** model to store and query in **ChromaDB**.\n4. Uses **Groq’s API** to generate answers by querying relevant chunks retrieved from the database based on the user’s question.\n\n---\n\n## **Features**\n\n- **Document Loading**: Loads `.txt` files from a specified directory to be processed.\n- **Text Chunking**: Splits documents into smaller chunks to ensure that each chunk is of manageable size, helping improve retrieval accuracy.\n- **Embedding Generation**: Embeds text chunks into vector representations using **SentenceTransformer** to efficiently store them in the database.\n- **Database Storage**: Uses **ChromaDB** for persistent storage and retrieval of embeddings.\n- **Question Answering**: Queries the database for relevant chunks based on a user-provided question and uses **Groq’s LLaMA model** to generate contextually relevant answers.\n\n---\n\n## **Explanation of Key Components**\n\n1. **Embedding Generation**:\n   - The `SentenceTransformer` model (`sentence-transformers/all-MiniLM-L6-v2`) converts text chunks into numerical vector representations (embeddings). These embeddings capture semantic meaning, enabling accurate document retrieval.\n\n2. **ChromaDB**:\n   - ChromaDB is used as a persistent vector database for storing embeddings. It enables efficient querying to find the most relevant document chunks based on a user’s query.\n\n3. **Groq’s LLaMA Model**:\n   - **Groq** is used to generate responses from the LLaMA model. The relevant chunks retrieved from ChromaDB are fed into Groq’s model to generate contextually aware answers.\n\n4. **Text Chunking**:\n   - Text is split into smaller, overlapping chunks for efficient retrieval and to prevent the model from being overwhelmed by large documents.\n\n5. **Persistent Storage**:\n   - The system uses Chroma’s persistent storage to ensure that embeddings are not lost across runs. This allows for future queries to be answered without needing to reprocess documents.\n\n---\n\n## **Project Workflow**\n\n1. **Document Loading**:\n   - All `.txt` files from the `news_articles` directory are loaded into memory.\n\n2. **Text Chunking**:\n   - Each document is split into smaller chunks for better retrieval performance.\n\n3. **Embedding Generation**:\n   - Each chunk is converted into an embedding using the SentenceTransformer model.\n\n4. **Document Insertion**:\n   - The embeddings are stored in **ChromaDB** for efficient retrieval during query time.\n\n5. **Querying**:\n   - When the user inputs a question, the system retrieves the most relevant chunks using embeddings.\n\n6. **Response Generation**:\n   - The retrieved chunks are then passed to **Groq’s LLaMA model** to generate an answer based on the context of the question.\n\n---\n\n## **Dependencies**\n\n- **ChromaDB**: Vector database for storing embeddings.\n- **Sentence-Transformers**: For generating text embeddings.\n- **Groq**: For interfacing with the LLaMA model to generate responses.\n- **python-dotenv**: For securely storing and loading the API key.\n- **os**: To interact with the operating system for file handling.\n\n---\n\n## **Future Enhancements**\n\n- **Multi-language support**: Extend the system to support documents and questions in multiple languages.\n- **Real-time news updates**: Integrate with an API to fetch real-time news articles and continuously update the document collection.\n- **Advanced user interface**: Implement a web-based front end for a more user-friendly interface.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreethi2805%2Fnaive_rag_chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpreethi2805%2Fnaive_rag_chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreethi2805%2Fnaive_rag_chat/lists"}