Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abdelrahman-amen/question_answering_from_any_url_with_rag

In this project, users can input any URL and ask a question related to its content. Using Retrieval-Augmented Generation (RAG) and LangChain, the app retrieves the most relevant answer from the webpage.
https://github.com/abdelrahman-amen/question_answering_from_any_url_with_rag

api dotenv embeddings genai generative-ai langchain python retreival-augmented-generation streamlit vectorstore webbased

Last synced: 7 days ago
JSON representation

Host: GitHub
URL: https://github.com/abdelrahman-amen/question_answering_from_any_url_with_rag
Owner: Abdelrahman-Amen
Created: 2025-01-31T16:18:12.000Z (8 days ago)
Default Branch: main
Last Pushed: 2025-01-31T16:38:07.000Z (8 days ago)
Last Synced: 2025-01-31T17:29:05.659Z (8 days ago)
Topics: api, dotenv, embeddings, genai, generative-ai, langchain, python, retreival-augmented-generation, streamlit, vectorstore, webbased
Language: Python
Homepage:
Size: 1.84 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🚀 Question-Answering from Any URL with RAG and LangChain 🌐

Welcome to the Question-Answering from Any URL project! This repository showcases a cutting-edge implementation of Retrieval-Augmented Generation (RAG) combined with the power of LangChain and Google Generative AI for seamless question-answering. 🧠✨

![Image](https://github.com/user-attachments/assets/89de9f8e-9531-422a-a2cd-dac14eed3d98)

# 🔍 What is RAG (Retrieval-Augmented Generation)?
Retrieval-Augmented Generation is an advanced AI framework that combines retrieval-based systems with generative models. Instead of relying solely on pre-trained knowledge, RAG enhances the model's capability by retrieving contextually relevant information from external sources like documents or web pages. This enables the generation of accurate, up-to-date, and context-aware answers. 💡

### How it works in this project:

1. Retrieve: Content is extracted from the provided URL.

2. Embed: The content is processed into vector representations using Google Generative AI embeddings.

3. Generate: The user's question is matched with the most relevant content, and the best answer is presented.

# What I Built in This Project 🛠️

This project is a web-based application where users can:

1. Input a URL: Provide the URL of any publicly accessible web page.

2. Ask a Question: Enter a question related to the content of the web page.

3. Receive an Answer: Get the most relevant answer, retrieved from the content and matched with the user's query.

# Core Features 🔗

• LangChain Framework: Used to streamline the integration of loaders, splitters, embeddings, and vector stores.

• Google Generative AI: Leverages the powerful embedding-001 model for creating embeddings.

• Web Content Loader: The WebBaseLoader extracts and prepares data from the given URL.

• Text Splitting: Documents are divided into manageable chunks using the RecursiveCharacterTextSplitter to ensure smooth processing.

• Vector Search with FAISS: Uses FAISS for high-speed similarity search and matching the query to the most relevant chunk of content.

• Streamlit UI: A clean and interactive user interface for a seamless experience.

# How It Works 💻

1.User Inputs:

• Enter a URL (e.g., a blog post, article, or documentation page).

• Type a question about the content of the page.

2.Processing Pipeline:

• The content is loaded using WebBaseLoader.

• It is split into smaller chunks for better analysis.

• Google Generative AI embeddings are generated for each chunk.

• A vector store is created, enabling fast similarity search.

3.Query Matching:

• The user's question is transformed into an embedding.

• The system searches the vector store for the most relevant content.

4.Results:

• The top result is displayed to the user as the answer to their question.

# Why Use This Project 🌟?

• Dynamic Information: Unlike static knowledge in pre-trained models, this app fetches the latest information from the web.

• Customizable: Easily extendable to support different types of embeddings, content sources, or models.

• User-Friendly: Intuitive Streamlit interface for easy interaction.

• Educational: A great resource to understand RAG, LangChain, and how to build end-to-end NLP applications.

# Demo 📽

Below is a demonstration of how the application works:

![Demo of the Application](https://github.com/Abdelrahman-Amen/Question_Answering_from_Any_URL_with_RAG/blob/main/Demo.gif)