An open API service indexing awesome lists of open source software.

https://github.com/hemaldholakiya12/pdfchat

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
https://github.com/hemaldholakiya12/pdfchat

ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store

Last synced: 7 months ago
JSON representation

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .

Awesome Lists containing this project

README

          

# PDFChat

A web-based application that allows users to upload PDF files and interact with them via a question-and-answer interface. This application parses the PDF, generates embeddings for the text, stores them in a vector database (FAISS), and retrieves relevant information using semantic search to provide contextual answers with an AI language model.

## Features

- Upload PDFs and extract text.
- Text chunking and embedding generation.
- Vector storage with FAISS for efficient similarity search.
- Answer generation using the Llama3 model hosted via Groq.
- Intuitive UI for chatting with your PDFs.

## Tech Stack

- **Frontend**: Next.js
- **Backend**: FastAPI
- **Text Processing**: PyMuPDFLoader, RecursiveCharacterTextSplitter
- **Embeddings**: HuggingFace MiniLM Model
- **Vector Search**: FAISS
- **AI Model**: Llama3 (via Groq)

## How It Works

This web-application follows a structured process to handle user-uploaded PDFs and respond to queries. Here’s a high-level flow of the PDF processing and question-answering pipeline:

```mermaid
flowchart TD
A[User Uploads PDF] --> B[Read PDF as Bytes using UploadFile]
B --> C[Write Bytes to Temporary File using tempfile]
C --> D[Load PDF using PyMuPDFLoader]
D --> E[Split Text into Chunks using RecursiveCharacterTextSplitter]
E --> F[Generate Embeddings using HuggingFace MiniLM]
F --> G[Store Embeddings in FAISS Vector Store]
G --> H[Create Retriever from FAISS]
H --> I[Initialize LLM - Groq LLaMA3-8B]
I --> J[Create QA Chain using RetrievalQA]

K[User Asks a Question] --> L[Use Retriever to find relevant chunks]
L --> M[Send Question and Chunks to LLaMA3]
M --> N[Generate Answer using LLM]
N --> O[Return Answer as JSON Response]