https://github.com/aryan-coder-student/chatpdf
An interactive Streamlit app that allows users to upload documents (PDF, TXT) and chat with them using a Retrieval-Augmented Generation (RAG) model. The app leverages LangChain for document parsing and retrieval, Chroma for vector storage, and a LLM for answering queries based on document content.
https://github.com/aryan-coder-student/chatpdf
generative-ai langchain python rag streamlit vector-database
Last synced: 3 months ago
JSON representation
An interactive Streamlit app that allows users to upload documents (PDF, TXT) and chat with them using a Retrieval-Augmented Generation (RAG) model. The app leverages LangChain for document parsing and retrieval, Chroma for vector storage, and a LLM for answering queries based on document content.
- Host: GitHub
- URL: https://github.com/aryan-coder-student/chatpdf
- Owner: Aryan-coder-student
- Created: 2024-11-17T18:10:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-19T17:59:17.000Z (about 1 year ago)
- Last Synced: 2025-10-10T01:46:41.565Z (9 months ago)
- Topics: generative-ai, langchain, python, rag, streamlit, vector-database
- Language: Jupyter Notebook
- Homepage: https://bascirag-chatpdf.streamlit.app/
- Size: 41 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📚 Document Chat Assistant 🤖
## 🌟 Project Overview
Revolutionize document interaction with our cutting-edge **Document Chat Assistant**! This intelligent application empowers users to upload, analyze, and explore multiple documents through an intuitive, AI-powered chat interface.
---
## ✨ Key Features
| Feature | Description | 🚀 Highlights |
|---------|-------------|---------------|
| 🗂️ Multi-Document Upload | Upload PDF and TXT files seamlessly | Process multiple documents simultaneously |
| 🧠 Smart Document Processing | Advanced document chunking and embedding | Uses state-of-the-art NLP techniques |
| 💬 RAG-Powered Interaction | Context-aware response generation | Combines retrieval and language models |
| 💾 Persistent Document Storage | Efficient embedding management | Utilizes Chroma for quick information retrieval |
| 🤝 Interactive Chat Interface | Natural language document exploration | Ask complex questions, get precise answers |
| 🔄 Flexible Reset Options | Manage chat and database | Easy reset for new document sets |
---

## 🚀 Getting Started
### Prerequisites
- 🐍 Python 3.8+
- 📦 pip package manager
### Installation
```bash
# Clone the repository
git clone https://github.com/yourusername/document-chat-assistant.git
# Navigate to project directory
cd document-chat-assistant
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
# Install dependencies
pip install -r requirements.txt
```
### Running the Application
```bash
# Launch Streamlit application
streamlit run app.py
```
---
## 🔍 How It Works
```mermaid
graph TD
A[Upload Documents] --> B[Preprocess Documents]
B --> C[Create Embeddings]
C --> D[Store in Chroma]
D --> E[User Query]
E --> F[Retrieve Relevant Context]
F --> G[Generate AI Response]
G --> H[Display Answer]
```
### Code Breakdown
#### Document Upload and Processing
**File Uploader**:
```python
uploaded_files = st.file_uploader(
"Upload Documents",
type=["pdf", "txt"],
accept_multiple_files=True
)
```
**Document Processing Workflow**:
```python
def process_documents(uploaded_files):
documents = []
for file in uploaded_files:
# Use appropriate loader based on file type
if file.type == "application/pdf":
loader = PyPDFLoader(file)
else:
loader = TextLoader(file)
# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
document_chunks = text_splitter.split_documents(loader.load())
# Create embeddings
embeddings = HuggingFaceEmbeddings()
vectorstore.add_documents(document_chunks)
return documents
```
#### RAG Prompt Template
```python
prompt_template = """
You are a helpful assistant. Answer the question based strictly on the provided context.
Think step by step and provide a detailed, accurate response.
Context:
{context}
Question: {question}
Helpful Answer:"""
```
### Key Technologies
- 🧠 **AI/ML**:
- LangChain
- HuggingFace Embeddings
- ChatGroq
- 🌐 **Web Framework**: Streamlit
- 💾 **Vector Database**: Chroma
---
## 🤝 Contributing
Interested in improving the Document Chat Assistant? We welcome contributions!
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
---
## 📜 License
Distributed under the MIT License. See `LICENSE` for more information.
---
**Created with ❤️ by AI Enthusiasts **