https://github.com/angelospanag/document-ai

A simple FastAPI application that allows users to upload PDF or DOCX documents in a database, get a summary generated by a local LLM via Ollama, and ask natural language questions about their content.
https://github.com/angelospanag/document-ai

alembic docker fastapi langchain llm ollama pydantic python python3 ruff sqlalchemy uv

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/angelospanag/document-ai
Owner: angelospanag
Created: 2025-04-20T19:52:06.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-08-25T21:17:01.000Z (10 months ago)
Last Synced: 2025-08-25T22:31:15.319Z (10 months ago)
Topics: alembic, docker, fastapi, langchain, llm, ollama, pydantic, python, python3, ruff, sqlalchemy, uv
Language: Python
Homepage:
Size: 180 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # 📄 document-ai

This is a simple FastAPI application that allows users to:

- ✅ Upload **PDF** or **DOCX** documents in a database

- 🧠 Get a **summary** generated by a local **LLM** (via [Ollama](https://ollama.com/))

- ❓ Ask natural language **questions** about the content of uploaded documents

The app is fully local — no API keys or cloud model usage required.

## What this is (and isn't)

While the project mimics the behavior of a RAG (Retrieval-Augmented Generation) system, it currently does not implement

full retrieval or semantic chunking. Instead, the entire document text is used as context during generation. This

approach

works well for smaller documents and simple use cases.

Planned enhancement: Full RAG support — including chunking, embedding, and vector similarity search — will be added in

future iterations to support larger document sets and improve accuracy at scale.

* [📄 document-ai](#-document-ai)

    * [What this is (and isn't)](#what-this-is-and-isnt)

    * [⚡ Features](#-features)

    * [🚀 Quick Start](#-quick-start)

        * [1. Install Python 3, uv, Docker and Ollama](#1-install-python-3-uv-docker-and-ollama)

        * [2. Create a virtual environment with all necessary dependencies](#2-create-a-virtual-environment-with-all-necessary-dependencies)

        * [3. Create a `.env` file at the root of the project](#3-create-a-env-file-at-the-root-of-the-project)

        * [4. Store models locally using Ollama](#4-store-models-locally-using-ollama)

        * [5. Run PostgreSQL using Docker and perform migrations](#5-run-postgresql-using-docker-and-perform-migrations)

    * [Run application](#run-application)

        * [Development mode](#development-mode)

        * [Production mode](#production-mode)

    * [Linting](#linting)

    * [Formatting](#formatting)

## ⚡ Features

- 🔍 **Summarization** of uploaded documents using local LLMs (like LLaMA3, Mistral, etc.)

- 🤖 **Context-aware Q&A** on document content

- 🛡️ Type-safe response models using pydantic

- 📂 Supports `.pdf` and `.docx` file uploads

- 🔧 Easily swappable LLM backend (via [Ollama](https://ollama.com/))

- 🛠️ **Database integration** with [SQLAlchemy](https://www.sqlalchemy.org/)

  and [Alembic](https://alembic.sqlalchemy.org/) for migrations

- 🧠 **LangChain** integration for chaining LLMs and handling complex document workflows

- 🧹 **Code linting and formatting** with [Ruff](https://docs.astral.sh/ruff/)

---

## 🚀 Quick Start

### 1. Install Python 3, uv, Docker and Ollama

**MacOS (using `brew`)**

```bash

brew install python@3.13 uv

brew install --cask docker ollama-app

```

### 2. Create a virtual environment with all necessary dependencies

From the root of the project execute:

```bash

uv sync

```

### 3. Create a `.env` file at the root of the project

```dotenv

# Models

GENERATION_MODEL_NAME=llama3.2

EMBEDDINGS_MODEL_NAME=nomic-embed-text

EMBEDDINGS_DIMENSIONS=768

# Database

DATABASE_USER=postgres

DATABASE_PASSWORD=postgres

DATABASE_NAME=postgres

DATABASE_HOST=localhost

DATABASE_PORT=5432

```

### 4. Store models locally using [Ollama](https://ollama.com/)

Use the generation and embeddings models you referenced as environment variables above.

Example using [llama3.2](https://ollama.com/library/llama3.2)

and [nomic-embed-text](https://ollama.com/library/nomic-embed-text):

```bash

ollama pull llama3.2 

ollama pull nomic-embed-text

```

### 5. Run [PostgreSQL using Docker](https://hub.docker.com/_/postgres) and perform migrations

```bash

docker compose up -d db

alembic upgrade head

```

## Run application

### Development mode

```bash

uv run fastapi dev app/main.py

```

### Production mode

```bash

uv run fastapi run app/main.py

```

## Linting

```bash

ruff check app/* tests/*

```

## Formatting

```bash

ruff format app/* tests/*

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/angelospanag/document-ai

Awesome Lists containing this project

README