An open API service indexing awesome lists of open source software.

https://github.com/jasjeev013/neuroquery-chroma-rag

NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.
https://github.com/jasjeev013/neuroquery-chroma-rag

ai chromadb document-intelligence gemini langchain multi-pdf-processing nlp pdf-analysis-python pdf-question-answering streamlit vector-search

Last synced: 6 months ago
JSON representation

NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.

Awesome Lists containing this project

README

          

# NeuroQuery: Intelligent Answers from Your Documents

![NeuroQuery Banner](screenshots/banner.png)

A state-of-the-art document question answering system that extracts knowledge from your PDFs using AI.

## Features

| Feature | Description |
|---------|-------------|
| 📄 Multi-PDF Processing | Upload and analyze up to 3 PDFs simultaneously (300 pages max each) |
| 💬 Natural Language Interface | Ask questions in plain English about your documents |
| 🧠 Smart Context Understanding | Gemini AI provides accurate answers based on document content |
| ⚡ Fast Retrieval | Chroma vector database enables quick information lookup |

![UI Screenshot](screenshots/interface.png)

## Technical Architecture

```mermaid
graph TD
A[PDF Upload] --> B[Text Extraction]
B --> C[Chunking]
C --> D[Vector Embeddings]
D --> E[Chroma DB Storage]
E --> F[User Query]
F --> G[Relevant Chunk Retrieval]
G --> H[Gemini Answer Generation]
H --> I[Response Display]
```

## Technology Stack

### Core Libraries

| Category | Libraries |
|----------|-----------|
| Framework | `langchain`, `langchain_community` |
| AI Models | `langchain_google_genai` (Gemini) |
| Vector DB | `langchain_chroma` |
| PDF Processing | `pypdf`, `pdfminer.six`, `unstructured` |
| Utilities | `python-dotenv`, `nest_asyncio`, `sentence-transformers` |
| UI | `streamlit` |

## Setup Instructions

### Prerequisites
- Python 3.8+
- Google API key with Gemini access

### Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/neuroquery.git
cd neuroquery
```

2. Create and activate virtual environment:
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

4. Create `.env` file:
```env
GOOGLE_API_KEY=your_api_key_here
```

### Running the Application
```bash
streamlit run app.py
```

## Deployment Options

| Platform | Instructions |
|----------|--------------|
| Streamlit Cloud | [Deploy Guide](https://docs.streamlit.io/streamlit-community-cloud/deploy-your-app) |
| Hugging Face | [Spaces Guide](https://huggingface.co/docs/hub/spaces) |
| AWS/Azure | Use Docker with Streamlit server |

## Usage Guide

1. Upload PDF documents (max 3 files)
2. Wait for processing to complete
3. Ask questions about the document content
4. View AI-generated answers with source references

## Troubleshooting

- **Processing Errors**: Ensure PDFs contain selectable text (not scanned images)
- **API Errors**: Verify your Google API key has Gemini access
- **Performance**: For large documents, increase chunk size in `config.py`

## License

[MIT License](LICENSE)

---

Developed with ❤️ by Jasjeev Singh Kohli