https://github.com/hemaldholakiya12/pdfchat
A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
https://github.com/hemaldholakiya12/pdfchat
ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store
Last synced: 7 months ago
JSON representation
A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
- Host: GitHub
- URL: https://github.com/hemaldholakiya12/pdfchat
- Owner: HemalDholakiya12
- Created: 2025-04-16T11:59:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-16T13:11:43.000Z (about 1 year ago)
- Last Synced: 2025-04-23T16:16:31.142Z (about 1 year ago)
- Topics: ai, api, cors, embeddings, faiss, fastapi, groq, huggingface, langchain, llama3, llm, pdf, pdf-processing, pymupdf, python, question-answering, semantic-search, text-splitting, transformers, vector-store
- Language: JavaScript
- Homepage:
- Size: 119 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDFChat
A web-based application that allows users to upload PDF files and interact with them via a question-and-answer interface. This application parses the PDF, generates embeddings for the text, stores them in a vector database (FAISS), and retrieves relevant information using semantic search to provide contextual answers with an AI language model.
## Features
- Upload PDFs and extract text.
- Text chunking and embedding generation.
- Vector storage with FAISS for efficient similarity search.
- Answer generation using the Llama3 model hosted via Groq.
- Intuitive UI for chatting with your PDFs.
## Tech Stack
- **Frontend**: Next.js
- **Backend**: FastAPI
- **Text Processing**: PyMuPDFLoader, RecursiveCharacterTextSplitter
- **Embeddings**: HuggingFace MiniLM Model
- **Vector Search**: FAISS
- **AI Model**: Llama3 (via Groq)
## How It Works
This web-application follows a structured process to handle user-uploaded PDFs and respond to queries. Here’s a high-level flow of the PDF processing and question-answering pipeline:
```mermaid
flowchart TD
A[User Uploads PDF] --> B[Read PDF as Bytes using UploadFile]
B --> C[Write Bytes to Temporary File using tempfile]
C --> D[Load PDF using PyMuPDFLoader]
D --> E[Split Text into Chunks using RecursiveCharacterTextSplitter]
E --> F[Generate Embeddings using HuggingFace MiniLM]
F --> G[Store Embeddings in FAISS Vector Store]
G --> H[Create Retriever from FAISS]
H --> I[Initialize LLM - Groq LLaMA3-8B]
I --> J[Create QA Chain using RetrievalQA]
K[User Asks a Question] --> L[Use Retriever to find relevant chunks]
L --> M[Send Question and Chunks to LLaMA3]
M --> N[Generate Answer using LLM]
N --> O[Return Answer as JSON Response]