https://github.com/albertodiazdurana/rag-document-assistant

Production-ready RAG system with multi-provider LLM support (OpenAI, Claude, Ollama), vector database integration, FastAPI backend, and MLflow evaluation. Features German language support and Streamlit UI.
https://github.com/albertodiazdurana/rag-document-assistant

anthropic chromadb document-qa embeddings fastapi langchain langgraph llm mlflow nlp ollama openai pinecone python rag retrieval-augmented-generation streamlit vector-database

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/albertodiazdurana/rag-document-assistant
Owner: albertodiazdurana
Created: 2026-01-17T17:52:41.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-03-21T15:36:02.000Z (3 months ago)
Last Synced: 2026-03-22T05:59:31.018Z (3 months ago)
Topics: anthropic, chromadb, document-qa, embeddings, fastapi, langchain, langgraph, llm, mlflow, nlp, ollama, openai, pinecone, python, rag, retrieval-augmented-generation, streamlit, vector-database
Language: Python
Homepage:
Size: 104 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # RAG Document Assistant

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)

[![FastAPI](https://img.shields.io/badge/FastAPI-0.109+-green.svg)](https://fastapi.tiangolo.com/)

[![LangChain](https://img.shields.io/badge/LangChain-0.1+-orange.svg)](https://langchain.com/)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Production-ready RAG system with multi-provider LLM support (OpenAI, Anthropic, Ollama), ChromaDB vector database, FastAPI backend, and MLflow evaluation.

## What This Does

Imagine you have hundreds of documents; manuals, reports, contracts. Instead of reading through all of them to find an answer, you simply ask a question in plain language: *"What are the warranty terms for product X?"* or *"Summarize the key findings from last quarter."*

This application reads your documents, understands their content, and answers your questions accurately; citing exactly where it found the information. You can choose which AI provider (OpenAI, Anthropic, or a local Ollama model) answers your questions.

## Features

### Implemented (Sprint 1 & 2)

- **Multi-Provider LLM Support**: OpenAI, Anthropic, Ollama (local models)

- **Vector Database**: ChromaDB with OpenAI embeddings

- **Document Processing**: PDF, Markdown, TXT with configurable chunking

- **RAG Chain**: Retrieval-augmented generation with conversation memory

- **FastAPI Backend**: REST API with async document processing

- **MLflow Evaluation**: Experiment tracking with faithfulness/relevance metrics

- **Streamlit UI**: Interactive document upload and chat interface with source citations

### Planned (Sprint 3: Experiments & Optimization)

- **German Language Support**: Multilingual embeddings (HuggingFace e5) and prompts

- **Hybrid Search**: Semantic + keyword (BM25) retrieval with fusion

- **Performance & Robustness**: Caching, token tracking, adversarial testing

- **Docker Deployment**: Containerized production setup

## Quick Start

### Prerequisites

- Python 3.11+

- OpenAI API key (or Anthropic/Ollama for alternative providers)

### Installation

```bash

# Clone repository

git clone https://github.com/yourusername/rag-document-assistant.git

cd rag-document-assistant

# Create virtual environment

python -m venv .venv

# Activate (Windows)

.venv\Scripts\activate

# Activate (Linux/Mac)

source .venv/bin/activate

# Install dependencies

pip install -e ".[dev]"

```

### Environment Setup

```bash

# Copy example environment file

cp .env.example .env

# Edit .env with your API keys

# Required: OPENAI_API_KEY (for embeddings)

# Optional: ANTHROPIC_API_KEY (for Claude LLM)

```

### Run the Application

```bash

# Start FastAPI backend

uvicorn src.api.main:app --reload

# API docs available at http://localhost:8000/docs

```

## Project Structure

```

rag-document-assistant/

├── src/

│   ├── ingestion/          # Document loaders and chunking

│   ├── vectorstore/        # Embeddings and vector database

│   ├── retrieval/          # RAG chain with memory

│   ├── llm/                # LLM providers and prompts

│   ├── api/                # FastAPI backend

│   └── evaluation/         # MLflow metrics

├── app/

│   └── streamlit_app.py    # Streamlit UI

├── tests/                  # Unit tests (84 tests)

├── data/

│   ├── sample_docs/        # Example documents

│   ├── eval/               # Evaluation test dataset

│   └── experiments/        # Experiment test data

└── docs/

    ├── checkpoints/        # Sprint progress

    ├── decisions/          # Architecture decisions (DEC-###)

    ├── experiments/        # Capability experiments (EXP-###)

    ├── plan/               # Project and sprint plans

    └── dsm-validation-tracker.md  # DSM methodology feedback

```

## API Endpoints

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/health` | GET | Health check |

| `/api/v1/ingest` | POST | Upload and index documents |

| `/api/v1/query` | POST | Query the RAG system |

| `/api/v1/models` | GET | List available LLM providers |

| `/api/v1/documents/count` | GET | Get indexed document count |

| `/api/v1/documents` | DELETE | Clear all documents |

## Configuration

### LLM Providers

```python

# OpenAI (default)

LLM_PROVIDER=openai

OPENAI_API_KEY=sk-...

# Anthropic Claude

LLM_PROVIDER=anthropic

ANTHROPIC_API_KEY=sk-ant-...

# Ollama (local, no API key needed)

LLM_PROVIDER=ollama

OLLAMA_MODEL=llama3.2

```

### Vector Database

```python

# ChromaDB (default, local)

VECTOR_DB=chroma

CHROMA_PERSIST_DIR=./chroma_db

```

## Hardware Requirements

**No GPU required.** This project uses API-based LLMs and embeddings:

| Component | Compute Location | Local Requirements |

|-----------|------------------|-------------------|

| LLM Inference | OpenAI/Claude API | Internet connection |

| Embeddings | OpenAI API | Internet connection |

| Vector Search | Local (ChromaDB) | ~4GB RAM |

| Ollama (optional) | Local CPU/GPU | 8GB+ RAM |

Works on Windows, macOS, and Linux.

## Development

```bash

# Run tests

pytest tests/ -v --cov=src

# Format code

black src/ tests/

ruff check src/ tests/

# Type checking

mypy src/

```

## Evaluation with MLflow

```bash

# Start MLflow UI

mlflow ui

# Run evaluation pipeline

python -m src.evaluation.runner

```

Tracked metrics:

- **Faithfulness**: Answer grounded in retrieved context

- **Relevance**: Retrieved chunks match query intent

- **Latency**: End-to-end response time

## Tech Stack

| Category | Technologies |

|----------|--------------|

| **LLM Orchestration** | LangChain |

| **LLM Providers** | OpenAI, Anthropic, Ollama |

| **Vector Database** | ChromaDB |

| **Embeddings** | OpenAI text-embedding-ada-002 |

| **Backend** | FastAPI, Pydantic, uvicorn |

| **Evaluation** | MLflow |

| **Testing** | pytest, pytest-cov (84 tests, 73% coverage) |

## Experiments

Built using [Take AI Bite](https://github.com/albertodiazdurana/take-ai-bite), a framework for human-AI collaboration. Its engine, the Deliberate Systematic Methodology (DSM), provides structured guidance for experiment-driven development to validate features systematically:

| Experiment | Focus | Status | Documentation |

|------------|-------|--------|---------------|

| EXP-001 | Multi-source conflict detection | Complete | [View](docs/experiments/EXP-001_multi-source-detection.md) |

| EXP-002 | Cross-lingual retrieval | Planned | Sprint 3, Day 7 |

| EXP-003 | Retrieval strategy comparison | Planned | Sprint 3, Day 8 |

| EXP-004 | Performance & robustness | Planned | Sprint 3, Day 9 |

| EXP-005 | End-to-end validation | Planned | Sprint 3, Day 10 |

Each experiment follows the [DSM C.1.3 Capability Experiment Template](https://github.com/albertodiazdurana/take-ai-bite) with combined quantitative (RAGAS, RAGBench) and qualitative evaluation.

**DSM Validation:** This project also validates the DSM methodology itself. Feedback is tracked in [dsm-validation-tracker.md](docs/dsm-validation-tracker.md).

## Known Limitations

Limitations are tracked per [DSM C.1.5 Limitation Discovery Protocol](https://github.com/albertodiazdurana/take-ai-bite). Current limitations from [EXP-001](docs/experiments/EXP-001_multi-source-detection.md):

| Limitation | Severity | Disposition | Workaround |

|------------|----------|-------------|------------|

| Simple queries may only cite one source | Medium | Accept MVP | Ask "What do all documents say about X?" |

| No automatic version/date awareness | Low | Defer | Name files with dates (e.g., `policy_2024.md`) |

| Documents persist until manually cleared | Low | Accept MVP | Use "Clear All Documents" button in UI |

| Relies on LLM reasoning for conflict detection | Medium | Defer | Ask explicitly about differences between sources |

**Getting Better Results:**

- Ask "Compare sources on X" to see differences

- Ask "Are there different answers for X?" for comprehensive coverage

- Use specific questions for precise answers

## Roadmap

### Sprint 1: Core RAG System (Complete)

- [x] Project setup and document ingestion pipeline

- [x] Vector database integration (ChromaDB)

- [x] RAG chain with multi-provider LLM support

- [x] FastAPI backend with REST API

- [x] MLflow evaluation framework

- [x] 84 tests, 73% coverage

### Sprint 2: User Interface (Complete)

- [x] Streamlit UI with HTTP backend communication

- [x] Multi-file document upload

- [x] Chat interface with source citations

- [x] EXP-001: Multi-source conflict detection experiment

### Sprint 3: Experiments & Optimization (In Progress)

*Experiment-driven development following [DSM v1.3.1](https://github.com/albertodiazdurana/take-ai-bite) methodology.*

| Day | Feature | Experiment |

|-----|---------|------------|

| 7 | German Language Support | EXP-002: Cross-lingual retrieval |

| 8 | Hybrid Search (BM25 + semantic) | EXP-003: Retrieval strategy comparison |

| 9 | Performance & Robustness | EXP-004: Latency & adversarial testing |

| 10 | Docker Deployment | EXP-005: End-to-end validation |

See [Sprint 3 Plan](docs/plan/sprint-3-plan.md) for details.

## License

MIT License

## Author

Alberto Diaz Durana

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/albertodiazdurana/rag-document-assistant

Awesome Lists containing this project

README