https://github.com/hemaldholakiya12/pdfchat

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
https://github.com/hemaldholakiya12/pdfchat

ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/hemaldholakiya12/pdfchat
Owner: HemalDholakiya12
Created: 2025-04-16T11:59:01.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-16T13:11:43.000Z (about 1 year ago)
Last Synced: 2025-04-23T16:16:31.142Z (about 1 year ago)
Topics: ai, api, cors, embeddings, faiss, fastapi, groq, huggingface, langchain, llama3, llm, pdf, pdf-processing, pymupdf, python, question-answering, semantic-search, text-splitting, transformers, vector-store
Language: JavaScript
Homepage:
Size: 119 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # PDFChat

A web-based application that allows users to upload PDF files and interact with them via a question-and-answer interface. This application parses the PDF, generates embeddings for the text, stores them in a vector database (FAISS), and retrieves relevant information using semantic search to provide contextual answers with an AI language model.

## Features

- Upload PDFs and extract text.

- Text chunking and embedding generation.

- Vector storage with FAISS for efficient similarity search.

- Answer generation using the Llama3 model hosted via Groq.

- Intuitive UI for chatting with your PDFs.

## Tech Stack

- **Frontend**: Next.js

- **Backend**: FastAPI

- **Text Processing**: PyMuPDFLoader, RecursiveCharacterTextSplitter

- **Embeddings**: HuggingFace MiniLM Model

- **Vector Search**: FAISS

- **AI Model**: Llama3 (via Groq)

## How It Works

This web-application follows a structured process to handle user-uploaded PDFs and respond to queries. Here’s a high-level flow of the PDF processing and question-answering pipeline:

```mermaid

flowchart TD

    A[User Uploads PDF] --> B[Read PDF as Bytes using UploadFile]

    B --> C[Write Bytes to Temporary File using tempfile]

    C --> D[Load PDF using PyMuPDFLoader]

    D --> E[Split Text into Chunks using RecursiveCharacterTextSplitter]

    E --> F[Generate Embeddings using HuggingFace MiniLM]

    F --> G[Store Embeddings in FAISS Vector Store]

    G --> H[Create Retriever from FAISS]

    H --> I[Initialize LLM - Groq LLaMA3-8B]

    I --> J[Create QA Chain using RetrievalQA]

    K[User Asks a Question] --> L[Use Retriever to find relevant chunks]

    L --> M[Send Question and Chunks to LLaMA3]

    M --> N[Generate Answer using LLM]

    N --> O[Return Answer as JSON Response]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hemaldholakiya12/pdfchat

Awesome Lists containing this project

README