{"id":29631398,"url":"https://github.com/farout101/agent-base","last_synced_at":"2026-04-11T20:44:30.674Z","repository":{"id":304853197,"uuid":"1020264301","full_name":"farout101/agent-base","owner":"farout101","description":"modified version of the original repo","archived":false,"fork":false,"pushed_at":"2025-07-15T15:47:20.000Z","size":149,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-16T10:53:41.175Z","etag":null,"topics":["streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/farout101.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-15T15:39:06.000Z","updated_at":"2025-07-15T15:47:23.000Z","dependencies_parsed_at":"2025-07-16T15:42:32.268Z","dependency_job_id":"fadbac59-1ef1-4599-ad20-d880aefd7a53","html_url":"https://github.com/farout101/agent-base","commit_stats":null,"previous_names":["farout101/agent-base"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/farout101/agent-base","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farout101%2Fagent-base","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farout101%2Fagent-base/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farout101%2Fagent-base/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farout101%2Fagent-base/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/farout101","download_url":"https://codeload.github.com/farout101/agent-base/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farout101%2Fagent-base/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266292907,"owners_count":23906609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["streamlit"],"created_at":"2025-07-21T11:37:46.682Z","updated_at":"2026-04-11T20:44:25.630Z","avatar_url":"https://github.com/farout101.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **LLM-Powered RAG Chatbot Agent**\n\nThis project demonstrates a Retrieval-Augmented Generation (RAG) chatbot agent powered by a locally hosted Large Language Model (LLM) using Ollama, LangChain, and Streamlit. The chatbot can answer questions based on a custom knowledge base (your documents), providing context-aware and grounded responses.\n\n## **🚀 Purpose \u0026 Tech Stack**\n\nThe primary purpose of this project is to provide a conceptual and extensible codebase for building RAG-based chatbots. It showcases how to integrate various open-source tools to create a functional AI agent that can interact with your private or domain-specific data.\n\n**Key Technologies Used:**\n\n- **Python:** The core programming language.\n- **Streamlit:** For building an interactive and user-friendly web interface.\n- **Ollama:** To run Large Language Models (LLMs) and embedding models locally.\n- **LangChain:** A powerful framework for orchestrating the RAG pipeline, handling document loading, text splitting, embeddings, and LLM interactions.\n- **ChromaDB:** A lightweight, open-source vector database for efficient storage and retrieval of document embeddings.\n- **YAML:** For externalizing and managing application configurations.\n\n## **💡 High-Level Overview**\n\nThe RAG chatbot operates on a simple yet powerful principle:\n\n1. **Data Ingestion (prep-data.py):**\n    - Your custom documents (PDFs, CSVs, text files, web pages) are loaded.\n    - These documents are split into smaller, manageable \"chunks.\"\n    - Each chunk is converted into a numerical representation called an \"embedding\" using a local embedding model (via Ollama).\n    - These embeddings and their corresponding text chunks are stored in a vector database (ChromaDB) for efficient similarity search.\n2. **Chatbot Interaction (app.py):**\n    - When a user asks a question, the question is also converted into an embedding.\n    - This query embedding is used to search the ChromaDB for the most \"similar\" (relevant) document chunks.\n    - The retrieved relevant chunks are then provided as \"context\" to a local LLM (via Ollama) along with the original user's question.\n    - The LLM generates an answer, grounded in the provided context, reducing hallucinations and improving factual accuracy.\n    - The conversation takes place within a Streamlit web interface.\n\n## **📁 Project Structure**\n\n\n```\n\n\n├── config.yaml # Centralized configuration for the entire application  \n├── prep-data.py # Script to prepare and ingest data into the vector database  \n├── app.py # Streamlit web application for the RAG chatbot UI  \n└── src/ # Source code for modular components  \n├── \\__init_\\_.py # Makes 'src' a Python package  \n├── config_loader.py # Handles loading and parsing of config.yaml  \n├── document_loader.py # Manages loading documents from various sources (PDF, CSV, Web, Text)  \n├── text_splitter.py # Encapsulates logic for splitting documents into chunks  \n├── embedding_model.py # Initializes and provides the Ollama embedding model  \n├── llm_model.py # Initializes and provides the Ollama LLM for generation  \n├── rag_chain.py # Builds and orchestrates the LangChain RAG pipeline  \n└── vector_store.py # Manages ChromaDB connection and document operations  \n└── data/ # Directory to store your raw source documents (create this)  \n├── pdfs/ # Example: Place your PDF files here  \n├── csvs/ # Example: Place your CSV files here  \n└── texts/ # Example: Place your plain text files here  \n└── chroma_db/ # Directory where ChromaDB will persist its data (created by prep-data.py)  \n\n```\n\n## **⚙️ Setting Up the Project**\n\nFollow these steps to get the RAG chatbot running on your local machine.\n\n### **Prerequisites**\n\n- **Python 3.9+:** Ensure Python is installed on your system.\n- **Git:** For cloning the repository.\n- **Docker \u0026 Docker Compose:** For running the local LLM via Ollama. Download and install from [www.docker.com](https://www.docker.com/).\n\n### **1\\. Clone the Repository**\n\n```\ngit clone git@github.com:yett/agent-base.git\ncd rag-chatbot  \n```\n\n### **2\\. Create and Activate a Virtual Environment**\n\nIt's highly recommended to use a virtual environment to manage project dependencies.\n\n- **On Windows:**  \n```\n    python -m venv venv  \n    .\\\\venv\\\\Scripts\\\\activate  \n```\n\n- **On macOS/Linux:**  \n```\n    python3 -m venv venv  \n    source venv/bin/activate  \n```\n\n### **3\\. Install Python Dependencies**\n\nWith your virtual environment activated, install all required Python packages:\n\n```\npip install pyyaml langchain-community pypdf pandas beautifulsoup4 ollama chromadb  \n```\n\n### **4\\. Prepare Your Data**\n\nCreate the data/ directory in your project root and place your documents inside the respective subfolders (pdfs/, csvs/, texts/).\n\n**Example data/ structure:**\n\n```\ndata/  \n├── pdfs/  \n│ └── my_document.pdf  \n├── csvs/  \n│ └── sales_data.csv  \n└── texts/  \n└── faq.txt  \n```\n\nUpdate config.yaml:\n\nEnsure the data_ingestion.document_sources section in your config.yaml accurately points to the paths of your data files or directories.\n```\n\\# config.yaml (excerpt)  \ndata_ingestion:  \ndocument_sources:  \n\\- type: \"pdf\"  \npath: \"./data/pdfs/\"  \n\\- type: \"csv\"  \npath: \"./data/csvs/my_data.csv\"  \n\\- type: \"text\"  \npath: \"./data/texts/faq.txt\"  \n\\# - type: \"website\" # Uncomment and configure if needed  \n\\# urls:  \n\\# - \"\u003chttps://example.com/some_article\u003e\"  \n```\n\n## **▶️ Running the Application**\n\n### **1. Start the Ollama Docker Container**\n\nOpen a terminal window and start the Ollama server using Docker Compose. This command will download the necessary Docker image and start the container in the background.\n\n```\ndocker-compose up -d\n```\n\nThe container is configured to automatically pull the `llama3.2` model on startup.\n\n### **2. Prepare/Refresh the RAG Data**\n\nOpen another terminal window, activate your virtual environment, and run the data preparation script. This will load your documents, chunk them, create embeddings, and store them in ChromaDB.\n\n```\npython prep-data.py\n```\n\nThis command will perform a **full refresh**: it will delete any existing ChromaDB data in `./chroma_db` and then re-ingest all documents specified in your `config.yaml`.\n\n### **3. Launch the Chatbot UI**\n\nIn the **same terminal** where you ran `prep-data.py` (with the virtual environment still active), launch the Streamlit application:\n\n```\nstreamlit run app.py\n```\n\nThis command will open a new tab in your default web browser, displaying the RAG chatbot interface.\n\n### **Stopping the Application**\n\n- **To stop the Streamlit app:** Press `Ctrl + C` in the terminal where it's running.\n- **To stop the Ollama container:** Run `docker-compose down` in your project directory.\n\nYou are now ready to interact with your LLM-powered RAG chatbot agent!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarout101%2Fagent-base","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffarout101%2Fagent-base","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarout101%2Fagent-base/lists"}