An open API service indexing awesome lists of open source software.

https://github.com/angelospanag/document-ai

A simple FastAPI application that allows users to upload PDF or DOCX documents in a database, get a summary generated by a local LLM via Ollama, and ask natural language questions about their content.
https://github.com/angelospanag/document-ai

alembic docker fastapi langchain llm ollama pydantic python python3 ruff sqlalchemy uv

Last synced: 3 months ago
JSON representation

A simple FastAPI application that allows users to upload PDF or DOCX documents in a database, get a summary generated by a local LLM via Ollama, and ask natural language questions about their content.

Awesome Lists containing this project

README

          

# ๐Ÿ“„ document-ai

This is a simple FastAPI application that allows users to:

- โœ… Upload **PDF** or **DOCX** documents in a database
- ๐Ÿง  Get a **summary** generated by a local **LLM** (via [Ollama](https://ollama.com/))
- โ“ Ask natural language **questions** about the content of uploaded documents

The app is fully local โ€” no API keys or cloud model usage required.

## What this is (and isn't)

While the project mimics the behavior of a RAG (Retrieval-Augmented Generation) system, it currently does not implement
full retrieval or semantic chunking. Instead, the entire document text is used as context during generation. This
approach
works well for smaller documents and simple use cases.

Planned enhancement: Full RAG support โ€” including chunking, embedding, and vector similarity search โ€” will be added in
future iterations to support larger document sets and improve accuracy at scale.

* [๐Ÿ“„ document-ai](#-document-ai)
* [What this is (and isn't)](#what-this-is-and-isnt)
* [โšก Features](#-features)
* [๐Ÿš€ Quick Start](#-quick-start)
* [1. Install Python 3, uv, Docker and Ollama](#1-install-python-3-uv-docker-and-ollama)
* [2. Create a virtual environment with all necessary dependencies](#2-create-a-virtual-environment-with-all-necessary-dependencies)
* [3. Create a `.env` file at the root of the project](#3-create-a-env-file-at-the-root-of-the-project)
* [4. Store models locally using Ollama](#4-store-models-locally-using-ollama)
* [5. Run PostgreSQL using Docker and perform migrations](#5-run-postgresql-using-docker-and-perform-migrations)
* [Run application](#run-application)
* [Development mode](#development-mode)
* [Production mode](#production-mode)
* [Linting](#linting)
* [Formatting](#formatting)

## โšก Features

- ๐Ÿ” **Summarization** of uploaded documents using local LLMs (like LLaMA3, Mistral, etc.)
- ๐Ÿค– **Context-aware Q&A** on document content
- ๐Ÿ›ก๏ธ Type-safe response models using pydantic
- ๐Ÿ“‚ Supports `.pdf` and `.docx` file uploads
- ๐Ÿ”ง Easily swappable LLM backend (via [Ollama](https://ollama.com/))
- ๐Ÿ› ๏ธ **Database integration** with [SQLAlchemy](https://www.sqlalchemy.org/)
and [Alembic](https://alembic.sqlalchemy.org/) for migrations
- ๐Ÿง  **LangChain** integration for chaining LLMs and handling complex document workflows
- ๐Ÿงน **Code linting and formatting** with [Ruff](https://docs.astral.sh/ruff/)

---

## ๐Ÿš€ Quick Start

### 1. Install Python 3, uv, Docker and Ollama

**MacOS (using `brew`)**

```bash
brew install python@3.13 uv
brew install --cask docker ollama-app
```

### 2. Create a virtual environment with all necessary dependencies

From the root of the project execute:

```bash
uv sync
```

### 3. Create a `.env` file at the root of the project

```dotenv
# Models
GENERATION_MODEL_NAME=llama3.2
EMBEDDINGS_MODEL_NAME=nomic-embed-text
EMBEDDINGS_DIMENSIONS=768

# Database
DATABASE_USER=postgres
DATABASE_PASSWORD=postgres
DATABASE_NAME=postgres
DATABASE_HOST=localhost
DATABASE_PORT=5432
```

### 4. Store models locally using [Ollama](https://ollama.com/)

Use the generation and embeddings models you referenced as environment variables above.

Example using [llama3.2](https://ollama.com/library/llama3.2)
and [nomic-embed-text](https://ollama.com/library/nomic-embed-text):

```bash
ollama pull llama3.2
ollama pull nomic-embed-text
```

### 5. Run [PostgreSQL using Docker](https://hub.docker.com/_/postgres) and perform migrations

```bash
docker compose up -d db
alembic upgrade head
```

## Run application

### Development mode

```bash
uv run fastapi dev app/main.py
```

### Production mode

```bash
uv run fastapi run app/main.py
```

## Linting

```bash
ruff check app/* tests/*
```

## Formatting

```bash
ruff format app/* tests/*
```