https://github.com/abdelrahman-amen/rag_agent
This project uses LangChain agents and Google Generative AI to build a RAG system, combining LLMs with tools like Wikipedia, Arxiv, and custom retrievers for accurate, real-time answers.
https://github.com/abdelrahman-amen/rag_agent
agents api arxiv-papers dotenv embeddings faiss langchain prompt-engineering python retreival-augmented-generation tools vectorstore webloader wikipedia-api
Last synced: 3 months ago
JSON representation
This project uses LangChain agents and Google Generative AI to build a RAG system, combining LLMs with tools like Wikipedia, Arxiv, and custom retrievers for accurate, real-time answers.
- Host: GitHub
- URL: https://github.com/abdelrahman-amen/rag_agent
- Owner: Abdelrahman-Amen
- Created: 2025-02-04T15:27:08.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-05T14:33:36.000Z (8 months ago)
- Last Synced: 2025-04-05T15:46:49.120Z (6 months ago)
- Topics: agents, api, arxiv-papers, dotenv, embeddings, faiss, langchain, prompt-engineering, python, retreival-augmented-generation, tools, vectorstore, webloader, wikipedia-api
- Language: Jupyter Notebook
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🚀 Retrieval-Augmented Generation (RAG) with Agents in LangChain and Google Generative AI 🌟

# 📚 What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) enhances the capabilities of language models by combining them with retrieval mechanisms. Instead of depending solely on the pre-trained knowledge within a model, RAG retrieves real-time, relevant data from external sources such as APIs, web pages, or document databases. This approach ensures the generated responses are more accurate, contextually rich, and grounded in reliable information.# 🤖 What is an Agent in LangChain?
An Agent is a sophisticated system within LangChain that acts as a decision-making layer. It interprets user queries, determines which tools (retrievers, APIs, or external services) are required to process the query, and then synthesizes the retrieved information into a coherent response.In this project, the agent plays a critical role in:
• Binding the Google Generative AI LLM with tools such as Wikipedia and Arxiv APIs.
• Orchestrating a seamless interaction between user inputs, external data sources, and the language model to provide informed and accurate outputs.
• Utilizing LangChain’s ability to format responses and handle intermediate steps.
# 🛠️ Project Workflow
### 1.Environment Configuration:
• The project uses google.generativeai and langchain_community modules to interact with Google Generative AI and external tools.
• Environment variables (API keys) are loaded via dotenv to securely access external services.
### 2.Document Loading and Splitting:
• A WebBaseLoader is used to fetch documents from a specific URL.
• The loaded documents are split into manageable chunks using RecursiveCharacterTextSplitter for effective retrieval and embedding.
### 3.Embeddings and Vector Store:
• Google Generative AI Embeddings are generated for the document chunks using the embedding-001 model.
• A FAISS Vector Store is created to facilitate fast and accurate retrieval of relevant information.
### 4.Tool Integration:
• Wikipedia API Wrapper: Retrieves concise summaries from Wikipedia for user queries.
• Arxiv API Wrapper: Fetches and summarizes scientific papers based on user input.
• Custom Retriever Tool: Searches the loaded documents for information about LangSmith using the FAISS retriever.
### 5.Prompt Design and LLM Binding:
• A ChatPromptTemplate is defined to format user inputs and responses into structured prompts for the language model.
• The Google Generative AI Chat model (gemini-pro) is initialized and bound to the defined tools.
### 6.Agent Execution:
• The agent uses the tools to handle queries such as:
• Retrieving details about LangSmith from the custom document loader.
• Fetching scientific paper summaries from Arxiv.
• Providing general knowledge answers using Wikipedia.
• Outputs are formatted using LangChain’s OpenAIFunctionsAgentOutputParser.
# 🌟 Key Features of This Project
• Combines Google Generative AI with retrieval-based tools for real-time, reliable information generation.
• Implements multiple data sources (Wikipedia, Arxiv, and custom web documents) for diverse query handling.
• Leverages LangChain’s Agent Framework to intelligently select tools and process user inputs.
• Provides a reusable and extensible architecture for Retrieval-Augmented Generation tasks.
#### This system demonstrates how agents can be used within RAG to build scalable, robust, and domain-specific AI solutions.