{"id":27624391,"url":"https://github.com/angelospanag/document-ai","last_synced_at":"2026-04-04T22:33:42.115Z","repository":{"id":288978801,"uuid":"969726369","full_name":"angelospanag/document-ai","owner":"angelospanag","description":"A simple FastAPI application that allows users to upload PDF or DOCX documents in a database, get a summary generated by a local LLM via Ollama, and ask natural language questions about their content.","archived":false,"fork":false,"pushed_at":"2025-08-25T21:17:01.000Z","size":184,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-25T22:31:15.319Z","etag":null,"topics":["alembic","docker","fastapi","langchain","llm","ollama","pydantic","python","python3","ruff","sqlalchemy","uv"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/angelospanag.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-20T19:52:06.000Z","updated_at":"2025-08-25T21:17:04.000Z","dependencies_parsed_at":"2025-06-08T06:15:36.208Z","dependency_job_id":null,"html_url":"https://github.com/angelospanag/document-ai","commit_stats":null,"previous_names":["angelospanag/document-ai"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/angelospanag/document-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fdocument-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fdocument-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fdocument-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fdocument-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/angelospanag","download_url":"https://codeload.github.com/angelospanag/document-ai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fdocument-ai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31416776,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T20:09:54.854Z","status":"ssl_error","status_checked_at":"2026-04-04T20:09:44.350Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alembic","docker","fastapi","langchain","llm","ollama","pydantic","python","python3","ruff","sqlalchemy","uv"],"created_at":"2025-04-23T11:38:48.269Z","updated_at":"2026-04-04T22:33:41.793Z","avatar_url":"https://github.com/angelospanag.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📄 document-ai\n\nThis is a simple FastAPI application that allows users to:\n\n- ✅ Upload **PDF** or **DOCX** documents in a database\n- 🧠 Get a **summary** generated by a local **LLM** (via [Ollama](https://ollama.com/))\n- ❓ Ask natural language **questions** about the content of uploaded documents\n\nThe app is fully local — no API keys or cloud model usage required.\n\n## What this is (and isn't)\n\nWhile the project mimics the behavior of a RAG (Retrieval-Augmented Generation) system, it currently does not implement\nfull retrieval or semantic chunking. Instead, the entire document text is used as context during generation. This\napproach\nworks well for smaller documents and simple use cases.\n\nPlanned enhancement: Full RAG support — including chunking, embedding, and vector similarity search — will be added in\nfuture iterations to support larger document sets and improve accuracy at scale.\n\n\n\u003c!-- TOC --\u003e\n\n* [📄 document-ai](#-document-ai)\n    * [What this is (and isn't)](#what-this-is-and-isnt)\n    * [⚡ Features](#-features)\n    * [🚀 Quick Start](#-quick-start)\n        * [1. Install Python 3, uv, Docker and Ollama](#1-install-python-3-uv-docker-and-ollama)\n        * [2. Create a virtual environment with all necessary dependencies](#2-create-a-virtual-environment-with-all-necessary-dependencies)\n        * [3. Create a `.env` file at the root of the project](#3-create-a-env-file-at-the-root-of-the-project)\n        * [4. Store models locally using Ollama](#4-store-models-locally-using-ollama)\n        * [5. Run PostgreSQL using Docker and perform migrations](#5-run-postgresql-using-docker-and-perform-migrations)\n    * [Run application](#run-application)\n        * [Development mode](#development-mode)\n        * [Production mode](#production-mode)\n    * [Linting](#linting)\n    * [Formatting](#formatting)\n\n\u003c!-- TOC --\u003e\n\n## ⚡ Features\n\n- 🔍 **Summarization** of uploaded documents using local LLMs (like LLaMA3, Mistral, etc.)\n- 🤖 **Context-aware Q\u0026A** on document content\n- 🛡️ Type-safe response models using pydantic\n- 📂 Supports `.pdf` and `.docx` file uploads\n- 🔧 Easily swappable LLM backend (via [Ollama](https://ollama.com/))\n- 🛠️ **Database integration** with [SQLAlchemy](https://www.sqlalchemy.org/)\n  and [Alembic](https://alembic.sqlalchemy.org/) for migrations\n- 🧠 **LangChain** integration for chaining LLMs and handling complex document workflows\n- 🧹 **Code linting and formatting** with [Ruff](https://docs.astral.sh/ruff/)\n\n---\n\n## 🚀 Quick Start\n\n### 1. Install Python 3, uv, Docker and Ollama\n\n**MacOS (using `brew`)**\n\n```bash\nbrew install python@3.13 uv\nbrew install --cask docker ollama-app\n```\n\n### 2. Create a virtual environment with all necessary dependencies\n\nFrom the root of the project execute:\n\n```bash\nuv sync\n```\n\n### 3. Create a `.env` file at the root of the project\n\n```dotenv\n# Models\nGENERATION_MODEL_NAME=llama3.2\nEMBEDDINGS_MODEL_NAME=nomic-embed-text\nEMBEDDINGS_DIMENSIONS=768\n\n# Database\nDATABASE_USER=postgres\nDATABASE_PASSWORD=postgres\nDATABASE_NAME=postgres\nDATABASE_HOST=localhost\nDATABASE_PORT=5432\n```\n\n### 4. Store models locally using [Ollama](https://ollama.com/)\n\nUse the generation and embeddings models you referenced as environment variables above.\n\nExample using [llama3.2](https://ollama.com/library/llama3.2)\nand [nomic-embed-text](https://ollama.com/library/nomic-embed-text):\n\n```bash\nollama pull llama3.2 \nollama pull nomic-embed-text\n```\n\n### 5. Run [PostgreSQL using Docker](https://hub.docker.com/_/postgres) and perform migrations\n\n```bash\ndocker compose up -d db\nalembic upgrade head\n```\n\n## Run application\n\n### Development mode\n\n```bash\nuv run fastapi dev app/main.py\n```\n\n### Production mode\n\n```bash\nuv run fastapi run app/main.py\n```\n\n## Linting\n\n```bash\nruff check app/* tests/*\n```\n\n## Formatting\n\n```bash\nruff format app/* tests/*\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fangelospanag%2Fdocument-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fangelospanag%2Fdocument-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fangelospanag%2Fdocument-ai/lists"}