{"id":26888813,"url":"https://github.com/BragAI/bRAG-langchain","last_synced_at":"2025-03-31T20:01:40.602Z","repository":{"id":263107477,"uuid":"889371307","full_name":"BragAI/bRAG-langchain","owner":"BragAI","description":"Everything you need to know to build your own RAG application","archived":false,"fork":false,"pushed_at":"2025-03-26T10:34:30.000Z","size":26927,"stargazers_count":2716,"open_issues_count":3,"forks_count":267,"subscribers_count":31,"default_branch":"main","last_synced_at":"2025-03-26T11:35:58.317Z","etag":null,"topics":["agentic-rag","ai","chatbot","llm","machine-learning","python","rag"],"latest_commit_sha":null,"homepage":"https://bragai.dev","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BragAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":"bragai","thanks_dev":null,"custom":null}},"created_at":"2024-11-16T07:41:36.000Z","updated_at":"2025-03-26T10:49:14.000Z","dependencies_parsed_at":"2025-02-03T23:20:24.915Z","dependency_job_id":"655e9213-fe36-4703-883e-537a8acfcec4","html_url":"https://github.com/BragAI/bRAG-langchain","commit_stats":null,"previous_names":["bragai/brag-langchain"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BragAI%2FbRAG-langchain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BragAI%2FbRAG-langchain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BragAI%2FbRAG-langchain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BragAI%2FbRAG-langchain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BragAI","download_url":"https://codeload.github.com/BragAI/bRAG-langchain/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246531968,"owners_count":20792736,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-rag","ai","chatbot","llm","machine-learning","python","rag"],"created_at":"2025-03-31T20:01:39.003Z","updated_at":"2025-03-31T20:01:40.590Z","avatar_url":"https://github.com/BragAI.png","language":"Jupyter Notebook","readme":"# Retrieval-Augmented Generation (RAG) Project\n\n**_Think it. Build it. bRAG it._ 🚀 bRAGAI's coming soon (🤫)**\n\n**[Join the waitlist](https://bragai.dev/)** for exclusive early access, be among the first to try your AI-powered full-stack development assistant, and transform ideas into production-ready web apps in minutes.\n\n---------------------\n\nThis repository contains a comprehensive exploration of Retrieval-Augmented Generation (RAG) for various applications.\nEach notebook provides a detailed, hands-on guide to setting up and experimenting with RAG from an introductory level to advanced implementations, including multi-querying and custom RAG builds.\n\n![rag_detail_v2](assets/img/rag-architecture.png)\n\n## Project Structure\n\nIf you want to jump straight into it, check out the file `full_basic_rag.ipynb` -\u003e this file will give you a boilerplate starter code of a fully customizable RAG chatbot.\n\nMake sure to run your files in a virtual environment (checkout section `Get Started`)\n\nThe following notebooks can be found under the directory `notebooks/`.\n\n### [1]\\_rag_setup_overview.ipynb\n\nThis introductory notebook provides an overview of RAG architecture and its foundational setup.\nThe notebook walks through: \n- **Environment Setup**: Configuring the environment, installing necessary libraries, and API setups.\n- **Initial Data Loading**: Basic document loaders and data preprocessing methods.\n- **Embedding Generation**: Generating embeddings using various models, including OpenAI's embeddings.\n- **Vector Store**: Setting up a vector store (ChromaDB/Pinecone) for efficient similarity search.\n- **Basic RAG Pipeline**: Creating a simple retrieval and generation pipeline to serve as a baseline.\n\n### [2]\\_rag_with_multi_query.ipynb\n\nBuilding on the basics, this notebook introduces multi-querying techniques in the RAG pipeline, exploring: \n- **Multi-Query Setup**: Configuring multiple queries to diversify retrieval.\n- **Advanced Embedding Techniques**: Utilizing multiple embedding models to refine retrieval.\n- **Pipeline with Multi-Querying**: Implementing multi-query handling to improve relevance in response generation.\n- **Comparison \u0026 Analysis**: Comparing results with single-query pipelines and analyzing performance improvements.\n\n### [3]_rag_routing_and_query_construction.ipynb\n\nThis notebook delves deeper into customizing a RAG pipeline.\nIt covers: \n- **Logical Routing:** Implements function-based routing for classifying user queries to appropriate data sources based on programming languages.\n- **Semantic Routing:** Uses embeddings and cosine similarity to direct questions to either a math or physics prompt, optimizing response accuracy.\n- **Query Structuring for Metadata Filters:** Defines structured search schema for YouTube tutorial metadata, enabling advanced filtering (e.g., by view count, publication date).\n- **Structured Search Prompting:** Leverages LLM prompts to generate database queries for retrieving relevant content based on user input.\n- **Integration with Vector Stores:** Links structured queries to vector stores for efficient data retrieval.\n\n\n### [4]_rag_indexing_and_advanced_retrieval.ipynb\n\nContinuing from the previous customization, this notebook explores:\n- **Preface on Document Chunking:** Points to external resources for document chunking techniques.\n- **Multi-representation Indexing:** Sets up a multi-vector indexing structure for handling documents with different embeddings and representations.\n- **In-Memory Storage for Summaries:** Uses InMemoryByteStore for storing document summaries alongside parent documents, enabling efficient retrieval.\n- **MultiVectorRetriever Setup:** Integrates multiple vector representations to retrieve relevant documents based on user queries.\n- **RAPTOR Implementation:** Explores RAPTOR, an advanced indexing and retrieval model, linking to in-depth resources.\n- **ColBERT Integration:** Demonstrates ColBERT-based token-level vector indexing and retrieval, which captures contextual meaning at a fine-grained level.\n- **Wikipedia Example with ColBERT:** Retrieves information about Hayao Miyazaki using the ColBERT retrieval model for demonstration.\n\n### [5]_rag_retrieval_and_reranking.ipynb\n\nThis final notebook brings together the RAG system components, with a focus on scalability and optimization: \n- **Document Loading and Splitting:** Loads and chunks documents for indexing, preparing them for vector storage.\n- **Multi-query Generation with RAG-Fusion:** Uses a prompt-based approach to generate multiple search queries from a single input question.\n- **Reciprocal Rank Fusion (RRF):** Implements RRF for re-ranking multiple retrieval lists, merging results for improved relevance.\n- **Retriever and RAG Chain Setup:** Constructs a retrieval chain for answering queries, using fused rankings and RAG chains to pull contextually relevant information.\n- **Cohere Re-Ranking:** Demonstrates re-ranking with Cohere’s model for additional contextual compression and refinement.\n- **CRAG and Self-RAG Retrieval:** Explores advanced retrieval approaches like CRAG and Self-RAG, with links to examples.\n- **Exploration of Long-Context Impact:** Links to resources explaining the impact of long-context retrieval on RAG models.\n\n## Getting Started\n\n### Pre-requisites\n\nEnsure **Python 3.11.11** (preferred) is installed on your system. Follow the platform-specific instructions below to install it if not already installed.\n\n#### macOS\n1. Install [Homebrew](https://brew.sh/) if not already installed:\n   ```bash\n   /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n   ```\n2. Install Python 3.11.11:\n   ```bash\n   brew install python@3.11\n   ```\n3. Verify installation:\n   ```bash\n   python3.11 --version\n   ```\n\n#### Linux\n1. Update your package manager:\n   ```bash\n   sudo apt update\n   ```\n2. Install Python 3.11.11:\n   ```bash\n   sudo apt install python3.11 python3.11-venv\n   ```\n3. Verify installation:\n   ```bash\n   python3.11 --version\n   ```\n\n#### Windows\n1. Download the Python 3.11.11 installer from [Python.org](https://www.python.org/downloads/).\n2. Run the installer and ensure you check the box **\"Add Python to PATH\"**.\n3. Verify installation:\n   ```cmd\n   python --version\n   ```\n---\n\n### Installation Instructions\n\n#### 1. Clone the Repository\n```bash\ngit clone https://github.com/bRAGAI/bRAG-langchain.git\ncd bRAG-langchain\n```\n\n#### 2. Create a Virtual Environment\nUse Python 3.11.11 to create a virtual environment:\n```bash\npython3.11 -m venv venv\n```\n\nActivate the virtual environment:\n- **macOS/Linux**:\n  ```bash\n  source venv/bin/activate\n  ```\n- **Windows**:\n  ```cmd\n  venv\\Scripts\\activate\n  ```\n\n#### 3. Verify and Fix Python Version\nIf the virtual environment defaults to a different Python version (e.g., Python 3.13):\n1. Verify the current Python version inside the virtual environment:\n   ```bash\n   python --version\n   ```\n2. Use Python 3.11 explicitly within the virtual environment:\n   ```bash\n   python3.11\n   ```\n3. Ensure the `python` command uses Python 3.11 by creating a symbolic link:\n   ```bash\n   ln -sf $(which python3.11) $(dirname $(which python))/python\n   ```\n4. Verify the fix:\n   ```bash\n   python --version\n   ```\n\n#### 4. Install Dependencies\nInstall the required packages:\n```bash\npip install -r requirements.txt\n```\n\n---\n\n### Additional Steps\n\n#### 5. Run the Notebooks\nBegin with `[1]_rag_setup_overview.ipynb` to get familiar with the setup process. Proceed sequentially through the other notebooks:\n\n- `[1]_rag_setup_overview.ipynb`\n- `[2]_rag_with_multi_query.ipynb`\n- `[3]_rag_routing_and_query_construction.ipynb`\n- `[4]_rag_indexing_and_advanced_retrieval.ipynb`\n- `[5]_rag_retrieval_and_reranking.ipynb`\n\n#### 6. Set Up Environment Variables\n1. Duplicate the `.env.example` file in the root directory and rename it to `.env`.\n2. Add the following keys (replace with your actual values):\n\n   ```env\n   # LLM Model - Get key at https://platform.openai.com/api-keys\n   OPENAI_API_KEY=\"your-api-key\"\n\n   # LangSmith - Get key at https://smith.langchain.com\n   LANGCHAIN_TRACING_V2=true\n   LANGCHAIN_ENDPOINT=\"https://api.smith.langchain.com\"\n   LANGCHAIN_API_KEY=\"your-api-key\"\n   LANGCHAIN_PROJECT=\"your-project-name\"\n\n   # Pinecone Vector Database - Get key at https://app.pinecone.io\n   PINECONE_INDEX_NAME=\"your-project-index\"\n   PINECONE_API_HOST=\"your-host-url\"\n   PINECONE_API_KEY=\"your-api-key\"\n\n   # Cohere - Get key at https://dashboard.cohere.com/api-keys\n   COHERE_API_KEY=your-api-key\n   ```\n\n---\n\nYou're now ready to use the project!\n\n## Usage\n\nAfter setting up the environment and running the notebooks in sequence, you can:\n\n1.  **Experiment with Retrieval-Augmented Generation**:\n    Use the foundational setup in `[1]_rag_setup_overview.ipynb` to understand the basics of RAG.\n\n2.  **Implement Multi-Querying**:\n    Learn how to improve response relevance by introducing multi-querying techniques in `[2]_rag_with_multi_query.ipynb`.\n\n## Star History\n\n\u003ca href=\"https://star-history.com/#bragai/brag-langchain\u0026Date\"\u003e\n \u003cpicture\u003e\n   \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/svg?repos=bragai/brag-langchain\u0026type=Date\u0026theme=dark\" /\u003e\n   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/svg?repos=bragai/brag-langchain\u0026type=Date\" /\u003e\n   \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/svg?repos=bragai/brag-langchain\u0026type=Date\" /\u003e\n \u003c/picture\u003e\n\u003c/a\u003e\n\n## Upcoming Notebooks\n\n👨🏻‍💻 **[MistralOCR](https://mistral.ai/news/mistral-ocr) + RAG Integration** \n\n## Contact\nDo you have questions or want to collaborate? Please open an issue or email Taha Ababou at taha@bragai.dev\n\n`If this project helps you, consider buying me a coffee ☕. Your support helps me keep contributing to the open-source community!`\n\u003cp\u003e\n    \u003ca href=\"https://buymeacoffee.com/bragai\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/sponsor-30363D?style=for-the-badge\u0026logo=GitHub-Sponsors\u0026logoColor=#white\" /\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n    The notebooks and visual diagrams were inspired by Lance Martin's LangChain Tutorial.\n\n    \n","funding_links":["https://buymeacoffee.com/bragai"],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBragAI%2FbRAG-langchain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBragAI%2FbRAG-langchain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBragAI%2FbRAG-langchain/lists"}