https://github.com/furk4neg3/rag-chatbot-pdf-question-answering

This project demonstrates a Retrieval-Augmented Generation (RAG) chatbot that allows users to upload PDF documents and ask questions. The chatbot provides answers using the contents of the uploaded PDF, leveraging IBM Watsonx's powerful LLM and embedding models along with LangChain for document processing and retrieval.
https://github.com/furk4neg3/rag-chatbot-pdf-question-answering

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/furk4neg3/rag-chatbot-pdf-question-answering
Owner: furk4neg3
Created: 2024-11-27T18:37:44.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-11-27T18:48:54.000Z (11 months ago)
Last Synced: 2025-02-01T19:12:52.865Z (9 months ago)
Language: Python
Homepage:
Size: 421 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# RAG Chatbot for PDF Question-Answering with IBM Watsonx and LangChain

## Key Features
- **Question-Answering from PDFs:** Upload a PDF, ask a question, and get an answer based on the document's content.
- **Retrieval-Augmented Generation (RAG):** Combines document retrieval and generative models to provide accurate responses.
- **LangChain Integration:** Utilizes LangChain for text splitting, document loading, and vector database management.
- **Gradio Interface:** Provides an interactive web interface for easy user interaction.

## Objectives
- Implement document loader to extract text from PDFs.
- Use text splitting techniques to handle large documents.
- Employ Watsonx for embeddings and LLM inference.
- Set up a vector database to store document embeddings.
- Build a question-answering system using the retrieval-augmented generation (RAG) approach.
- Create a Gradio interface to make the chatbot user-friendly.

## Workflow
- Document Loader: The PDF file is loaded using PyPDFLoader from LangChain.
- Text Splitter: The document is split into smaller chunks using RecursiveCharacterTextSplitter.
- Embedding: Each chunk is embedded using Watsonx's embedding model (ibm/slate-125m-english-rtrvr).
- Vector Database: The chunks are stored in a vector database (Chroma) for efficient retrieval.
- Retriever and QA Chain: The chatbot uses the retriever to find relevant information and the LLM to generate an answer.

## Example Interaction
- Upload a PDF document containing your desired information.
- Ask a question, such as "What is the main topic of the document?"
- Receive a generated response based on the content of the uploaded PDF.

## Requirements
- Python 3.x
- Gradio
- LangChain
- IBM Watsonx SDK
- Chroma (vector store)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/furk4neg3/rag-chatbot-pdf-question-answering

Awesome Lists containing this project

README