https://github.com/jasjeev013/neuroquery-chroma-rag
NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.
https://github.com/jasjeev013/neuroquery-chroma-rag
ai chromadb document-intelligence gemini langchain multi-pdf-processing nlp pdf-analysis-python pdf-question-answering streamlit vector-search
Last synced: 6 months ago
JSON representation
NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.
- Host: GitHub
- URL: https://github.com/jasjeev013/neuroquery-chroma-rag
- Owner: jasjeev013
- Created: 2025-07-27T11:18:04.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-07-27T11:50:14.000Z (7 months ago)
- Last Synced: 2025-07-27T13:26:09.208Z (7 months ago)
- Topics: ai, chromadb, document-intelligence, gemini, langchain, multi-pdf-processing, nlp, pdf-analysis-python, pdf-question-answering, streamlit, vector-search
- Language: Python
- Homepage:
- Size: 1.47 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NeuroQuery: Intelligent Answers from Your Documents

A state-of-the-art document question answering system that extracts knowledge from your PDFs using AI.
## Features
| Feature | Description |
|---------|-------------|
| 📄 Multi-PDF Processing | Upload and analyze up to 3 PDFs simultaneously (300 pages max each) |
| 💬 Natural Language Interface | Ask questions in plain English about your documents |
| 🧠 Smart Context Understanding | Gemini AI provides accurate answers based on document content |
| ⚡ Fast Retrieval | Chroma vector database enables quick information lookup |

## Technical Architecture
```mermaid
graph TD
A[PDF Upload] --> B[Text Extraction]
B --> C[Chunking]
C --> D[Vector Embeddings]
D --> E[Chroma DB Storage]
E --> F[User Query]
F --> G[Relevant Chunk Retrieval]
G --> H[Gemini Answer Generation]
H --> I[Response Display]
```
## Technology Stack
### Core Libraries
| Category | Libraries |
|----------|-----------|
| Framework | `langchain`, `langchain_community` |
| AI Models | `langchain_google_genai` (Gemini) |
| Vector DB | `langchain_chroma` |
| PDF Processing | `pypdf`, `pdfminer.six`, `unstructured` |
| Utilities | `python-dotenv`, `nest_asyncio`, `sentence-transformers` |
| UI | `streamlit` |
## Setup Instructions
### Prerequisites
- Python 3.8+
- Google API key with Gemini access
### Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/neuroquery.git
cd neuroquery
```
2. Create and activate virtual environment:
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Create `.env` file:
```env
GOOGLE_API_KEY=your_api_key_here
```
### Running the Application
```bash
streamlit run app.py
```
## Deployment Options
| Platform | Instructions |
|----------|--------------|
| Streamlit Cloud | [Deploy Guide](https://docs.streamlit.io/streamlit-community-cloud/deploy-your-app) |
| Hugging Face | [Spaces Guide](https://huggingface.co/docs/hub/spaces) |
| AWS/Azure | Use Docker with Streamlit server |
## Usage Guide
1. Upload PDF documents (max 3 files)
2. Wait for processing to complete
3. Ask questions about the document content
4. View AI-generated answers with source references
## Troubleshooting
- **Processing Errors**: Ensure PDFs contain selectable text (not scanned images)
- **API Errors**: Verify your Google API key has Gemini access
- **Performance**: For large documents, increase chunk size in `config.py`
## License
[MIT License](LICENSE)
---
Developed with ❤️ by Jasjeev Singh Kohli