{"id":26319001,"url":"https://github.com/wittyicon29/qabot-with-conversational-memory","last_synced_at":"2026-04-13T08:32:34.101Z","repository":{"id":246199343,"uuid":"820387122","full_name":"wittyicon29/QABot-with-Conversational-Memory","owner":"wittyicon29","description":"Natural Language Query Agent over some web data and some pdf which has conversational memory using Groq Cloud API","archived":false,"fork":false,"pushed_at":"2024-06-26T17:57:16.000Z","size":696,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-15T15:17:11.926Z","etag":null,"topics":["chromadb","groq-api","langchain","llms","python","rag","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wittyicon29.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-26T11:16:23.000Z","updated_at":"2024-12-21T12:53:27.000Z","dependencies_parsed_at":"2024-06-26T14:13:21.712Z","dependency_job_id":"f979c7b3-caee-4cea-a33e-7bed387fb039","html_url":"https://github.com/wittyicon29/QABot-with-Conversational-Memory","commit_stats":null,"previous_names":["wittyicon29/qabot-with-conversational-memory"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wittyicon29/QABot-with-Conversational-Memory","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wittyicon29%2FQABot-with-Conversational-Memory","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wittyicon29%2FQABot-with-Conversational-Memory/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wittyicon29%2FQABot-with-Conversational-Memory/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wittyicon29%2FQABot-with-Conversational-Memory/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wittyicon29","download_url":"https://codeload.github.com/wittyicon29/QABot-with-Conversational-Memory/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wittyicon29%2FQABot-with-Conversational-Memory/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31746102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T06:26:45.479Z","status":"ssl_error","status_checked_at":"2026-04-13T06:26:44.645Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromadb","groq-api","langchain","llms","python","rag","streamlit"],"created_at":"2025-03-15T15:17:16.192Z","updated_at":"2026-04-13T08:32:34.084Z","avatar_url":"https://github.com/wittyicon29.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Project Overview\n\nThis project demonstrates the creation of a Natural Language Query Agent capable of answering questions based on a small set of lecture notes from Stanford's LLM lectures and a table of milestone LLM architectures. The system leverages LLMs and open-source vector indexing and storage frameworks to provide conversational answers, with an emphasis on follow-up queries and conversational memory. \n\n### Data Sources\n\n1. **Stanford LLMs Lecture Notes**:\n    - Introduction: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/introduction/)\n    - Capabilities: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/capabilities/)\n    - Harm-1: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/harm-1/)\n    - Harm-2: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/harm-2/)\n    - Data: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/data/)\n    - Modeling: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/modeling/)\n    - Training: [Lecture Link](https://stanford-cs324.github.io/winter2022/lectures/training/)\n   \n2. **Milestone Papers**: Table of model architectures from [Awesome LLM](https://github.com/Hannibal046/Awesome-LLM#milestone-papers).\n\n### Project Structure\n\n- **data_loading.py**: Contains functions to load data from the web and PDF.\n- **processing.py**: Functions to split text into chunks and generate embeddings.\n- **model_initialization.py**: Code to initialize the model and retrieval chain.\n- **main.py**: Streamlit application for the chatbot interface.\n\n### Intermediary Representation\n![PDF](https://github.com/wittyicon29/QABot-with-Conversational-Memory/assets/99320225/5832d0be-a092-4acb-97c0-d7fc7657942b)\n\n**Data Organization and Embedding**:\n\n1. **Raw Data Loading**: \n    - Web pages and PDF files are loaded using `WebBaseLoader` and `PyPDFLoader` respectively.\n    \n2. **Text Splitting**:\n    - Documents are split into manageable chunks using `RecursiveCharacterTextSplitter` with a chunk size of 1200 characters and an overlap of 200 characters.\n    \n3. **Embedding**:\n    - Text chunks are converted into embeddings using the HuggingFace model `BAAI/bge-small-en`.\n    \n4. **Vector Store**:\n    - The embeddings are stored in a Chroma vector store, making them searchable.\n\n### Detailed Steps\n\n#### Loading Data\n\n1. **WebBaseLoader**: Fetches and loads web pages.\n2. **PyPDFLoader**: Loads and parses the PDF containing milestone papers.\n3. **MergedDataLoader**: Merges the data from the web and PDF loaders.\n\n#### Processing Data\n\n1. **Text Splitting**: \n    - `RecursiveCharacterTextSplitter` divides the loaded text into smaller, overlapping chunks to ensure that context is preserved.\n    \n2. **Embedding Generation**:\n    - `HuggingFaceBgeEmbeddings` generates embeddings for the text chunks using a pre-trained model.\n    \n3. **Vector Store**:\n    - The Chroma vector store is used to store and index these embeddings, enabling efficient retrieval.\n\n#### Initializing the Model\n\n1. **LLM Initialization**:\n    - `ChatGroq` initializes the chosen LLM model using the provided API key.\n    \n2. **Prompt Templates**:\n    - Custom prompt templates are created to reformulate user queries and generate responses based on the retrieved context.\n    \n3. **Retrieval Chain**:\n    - A retrieval chain is created that uses a history-aware retriever to provide context-aware answers.\n\n### Application\n\nA Streamlit application allows users to interact with the chatbot. Key features include:\n- **Input Query**: Users can enter natural language queries.\n- **Chat History**: The system maintains context across multiple queries.\n- **Display of Sources**: The sources used to generate answers are displayed, ensuring transparency.\n\n### Workflow of the System \n![ThP - Flowchart](https://github.com/wittyicon29/QABot-with-Conversational-Memory/assets/99320225/3af7d6a5-3628-4fed-915b-59e74077b31a)\n\n### Deployment and Scaling\n\n1. **Deployment Plan**:\n    - Can be directly deployed over Streamlit Cloud for public access\n    - Containerize the application using Docker for easy deployment.\n    - Use cloud services like AWS or GCP for scalability.\n    \n3. **Scaling**:\n    - Utilizing GPU capability to reduce the latency of generating the response.\n    - As the number of lectures or papers grows, the retrieval can be made more efficient through improved vector storing\n    - Implement caching strategies to improve response times for frequently asked questions.\n\n### Improvements and Future Work\n\n- **Enhanced Conversational Memory**: Improving the system's ability to handle complex, multi-turn conversations.\n- **Citation and Reference Handling**: More sophisticated citation mechanisms to link specific sections of texts used in answers.\n\n### Setup Instructions\n\n1. **Clone the Repository**:\n    ```sh\n    git clone \u003crepository-url\u003e\n    cd \u003crepository-folder\u003e\n    ```\n\n2. **Install Dependencies**:\n    ```sh\n    pip install -r requirements.txt\n    ```\n\n3. **Run the Application**:\n    ```sh\n    streamlit run main.py\n    ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwittyicon29%2Fqabot-with-conversational-memory","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwittyicon29%2Fqabot-with-conversational-memory","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwittyicon29%2Fqabot-with-conversational-memory/lists"}