{"id":17998758,"url":"https://github.com/moashour93/construction_private_gpt","last_synced_at":"2026-02-07T14:32:42.411Z","repository":{"id":260077658,"uuid":"880216611","full_name":"MoAshour93/Construction_Private_GPT","owner":"MoAshour93","description":" A versatile document query chatbot powered by GPT-4ALL and Llama, supporting multi-format document ingestion and efficient retrieval using embeddings and ChromaDB. Ideal for transforming unstructured data into insights interactively.","archived":false,"fork":false,"pushed_at":"2024-10-29T18:08:05.000Z","size":1278,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-09T18:16:21.860Z","etag":null,"topics":["chatbot","chromadb","documents","gpt4all","llama","llama-index","python","queries","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MoAshour93.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-29T10:33:38.000Z","updated_at":"2025-01-27T03:00:50.000Z","dependencies_parsed_at":"2024-12-03T16:34:29.432Z","dependency_job_id":"9eec8d81-38e8-4860-bf32-ddab7aa6a349","html_url":"https://github.com/MoAshour93/Construction_Private_GPT","commit_stats":{"total_commits":11,"total_committers":1,"mean_commits":11.0,"dds":0.0,"last_synced_commit":"386882cc6b15f8269077428ac334d92ce2994480"},"previous_names":["moashour93/private_gpt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoAshour93%2FConstruction_Private_GPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoAshour93%2FConstruction_Private_GPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoAshour93%2FConstruction_Private_GPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MoAshour93%2FConstruction_Private_GPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MoAshour93","download_url":"https://codeload.github.com/MoAshour93/Construction_Private_GPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247135147,"owners_count":20889421,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","chromadb","documents","gpt4all","llama","llama-index","python","queries","rag"],"created_at":"2024-10-29T22:05:10.240Z","updated_at":"2026-02-07T14:32:42.383Z","avatar_url":"https://github.com/MoAshour93.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📄 GPT-4ALL Document Query Chatbot\n\nThis project enables users to query a wide variety of documents using an advanced chatbot powered by open-source LLMs like GPT-4ALL and Llama. Leveraging embeddings, vector databases, and data loaders, this system efficiently handles document parsing, storage, and retrieval.\n\n## 📑 Table of Contents\n- [📌 Project Overview](#-project-overview)\n- [🚀 Features](#-features)\n- [📂 Project Structure](#-project-structure)\n- [🛠 Installation](#-installation)\n- [🚀 Usage](#-usage)\n- [🔗 General Links \u0026 Resources](#-general-links--resources)\n- [⚙️ Configuration](#%EF%B8%8F-configuration)\n- [🗂️ Supported Document Formats](#%EF%B8%8F-supported-document-formats)\n- [📈 Limitations \u0026 Next Steps](#-limitations--next-steps)\n- [📄 License](#-license)\n- [📞 Support](#-support)\n\n---\n\n## 📌 Project Overview\n\nIn today’s data-intensive environments, there’s a growing need to convert unstructured data into actionable insights. This chatbot bridges that gap by allowing users to interactively query documents, with support for multiple formats including PDF, Word, PowerPoint, Markdown, and more.\n\nBuilt with `langchain` and `chromadb`, this solution processes documents by:\n- Converting them into text chunks.\n- Embedding these chunks as vectors.\n- Storing them for easy retrieval, powered by a selected LLM model.\n\n---\n\n## 🚀 Features\n- **Multi-format Document Support**: Accepts documents in `.pdf`, `.docx`, `.pptx`, `.txt`, and other formats.\n- **Embeddings with Langchain**: Uses `HuggingFaceBgeEmbeddings` for text chunk embeddings.\n- **Vector Storage with ChromaDB**: Stores text embeddings as vectors for efficient retrieval.\n- **Choice of LLMs**: Supports GPT-4ALL and Llama models for answering queries.\n- **Customizable Environment**: Easily configure model and embedding options via `.env`.\n\n---\n\n## 📂 Project Structure\n\n- `requirements.txt`: Lists necessary Python packages.\n- `.env`: Contains environment variables for model and database settings.\n- `constants.py`: Holds constants for Chroma database configuration.\n- `ingest.py`: Processes and stores documents as vectors for future querying.\n- `privateGPT.py`: Main chatbot script for querying stored documents.\n\n---\n\n## 🛠 Installation\n\n1. **Clone the repository**:\n    ```bash\n    git clone https://github.com/MoAshour93/Construction_Private_GPT.git\n    cd Construction_Private_GPT\n    ```\n\n2. **Install dependencies**:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3. **Set up environment variables**:\n   - Create a `.env` file in the root directory, using the provided template:\n     ```bash\n     PERSIST_DIRECTORY=db\n     MODEL_TYPE=GPT4All\n     MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin\n     EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2\n     MODEL_N_CTX=1000\n     MODEL_N_BATCH=8\n     TARGET_SOURCE_CHUNKS=4\n     ```\n\n---\n\n## 🚀 Usage\n\n### 1. Ingest Documents\n   Use `ingest.py` to process and store document embeddings:\n   ```bash\n   python ingest.py\n   ```\n\n### 2. Run the Chatbot\n   Start querying documents using `privateGPT.py`:\n   ```bash\n   python privateGPT.py\n   ```\n   - Enter your query at the prompt.\n   - Type `exit` to end the session.\n\n### 🔧 Customizable Options\n   - Use `--hide-source` or `-S` to hide source documents used in responses.\n   - Use `--mute-stream` or `-M` to disable streaming output from the LLM.\n\n---\n\n## 🔗 General Links \u0026 Resources\n\n- **Our Website**: [www.apcmasterypath.co.uk](https://www.apcmasterypath.co.uk)\n- **APC Mastery Path Blogposts**: [APC Blogposts](https://www.apcmasterypath.co.uk/blog-list)\n- **LinkedIn Pages**: [Personal](https://www.linkedin.com/in/mohamed-ashour-0727/) | [APC Mastery Path](https://www.linkedin.com/company/apc-mastery-path)\n\n---\n\n## ⚙️ Configuration\n\n- **Constants**: The `constants.py` file includes important settings for the ChromaDB database.\n- **Environment Variables**: Set customizable parameters in `.env`, including model path and embedding model name.\n\n---\n\n## 🗂️ Supported Document Formats\n\n| Format          | Loader                        |\n|-----------------|-------------------------------|\n| PDF             | `PyPDFLoader`                 |\n| Word Documents  | `UnstructuredWordDocumentLoader` |\n| PowerPoint      | `UnstructuredPowerPointLoader` |\n| Markdown        | `UnstructuredMarkdownLoader`  |\n| CSV             | `CSVLoader`                   |\n| Text            | `TextLoader`                  |\n\n---\n\n## 📈 Limitations \u0026 Next Steps\n\nThis initial implementation is a command-line-based chatbot, but it can be extended:\n1. **GUI Integration**: Integrate with `Streamlit` or `Chainlit` for a graphical user interface.\n2. **Multi-agent Architecture**: Develop task-specific agents for more complex queries.\n3. **Broader LLM Support**: Experiment with other open-source models from Hugging Face.\n\n---\n\n## 📄 License\nThis project is licensed under the [Apache 2.0 License](LICENSE).\n\n---\n\n## 📞 Support\nFor any questions, feel free to contact [Mohamed Ashour](https://www.linkedin.com/in/mohamed-ashour-0727/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoashour93%2Fconstruction_private_gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoashour93%2Fconstruction_private_gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoashour93%2Fconstruction_private_gpt/lists"}