An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-ai

A curated list of projects in awesome lists tagged with document-ai .

https://github.com/microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e

Last synced: 13 May 2025

https://github.com/clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

computer-vision document-ai eccv-2022 multimodal-pre-trained-model nlp ocr

Last synced: 13 May 2025

https://github.com/wxyhgk/retain-pdf

在保留版面、公式与结构的前提下进行 PDF 翻译,适用于科研与技术文档

document-ai document-processing layout-preserving ocr pdf scientific-papers translation typst

Last synced: 31 May 2026

https://github.com/scut-dlvclab/document-ai-recommendations

Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.

document-ai document-understanding key-information-extraction table-structure-recognition visual-information-extraction

Last synced: 12 Feb 2026

https://github.com/clovaai/webvicob

Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023

document-ai icdar2023 nlp ocr

Last synced: 06 Oct 2025

https://github.com/doc-analysis/readingbank

ReadingBank: A Benchmark Dataset for Reading Order Detection

document-ai document-intelligence document-understanding natural-language-processing nlp ocr

Last synced: 04 Jan 2026

https://github.com/zeninglin/vibertgrid-pytorch

An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"

document-ai document-analysis information-extraction key-information-extraction visual-information-extraction

Last synced: 07 Oct 2025

https://github.com/whn09/table_structure_recognition

Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.

document-ai ocr table table-detection table-structure-recognition yolov5 yolov8

Last synced: 02 Mar 2026

https://github.com/googleapis/python-documentai-toolbox

Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.

ai document-ai gcp generative-ai google-cloud google-cloud-platform vertex-ai

Last synced: 22 Mar 2025

https://github.com/zeninglin/peneo

[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.

document-ai document-understanding key-information-extraction ocr visual-information-extraction

Last synced: 27 Mar 2025

https://github.com/scut-dlvclab/rfund

[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"

document-ai document-understanding key-information-extraction ocr visual-information-extraction

Last synced: 07 Feb 2026

https://github.com/LexStack-AI/LexReviewer

LexReviewer is an AI legal document review engine that ingests PDFs, uses RAG for hybrid retrieval, and enables grounded chat with citations and bounding-box references, delivering verifiable answers directly from the source text.

citation contract-analysis contract-review document-ai legal-ai legaltech rag

Last synced: 31 Mar 2026

https://github.com/wintermi/ocr-runner

OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.

cloud-vision cloud-vision-api document-ai google-cloud google-cloud-platform

Last synced: 05 May 2025

https://github.com/komalharshita/pocketpilot-ai

PocketPilot AI is a smart personal finance assistant that turns everyday receipts into real-time spending insights using Gemini API.

document-ai firebase-auth gemini-api hackathon-project

Last synced: 07 Mar 2026

https://github.com/bwnyasse/dart-documentai-samples

A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.

dart dartlang document-ai document-understanding google-cloud machine-learning samples

Last synced: 21 Jun 2025

https://github.com/nanonets/nanoindex

Turn PDFs into tree and graph-based RAG with Karpathy-inspired knowledge bases. Cited answers with page coordinates. Self-correcting extraction. No vectors needed.

citations document-ai graph indexing knowledge-graph llm nanonets ocr pdf question-answering rag rag-pipeline vectorless-rag vlm vlm-ocr

Last synced: 10 Apr 2026

https://github.com/qyinm/hyprduck

Local-first brain repo and trust console for AI agents. Turns local documents into source-backed wiki, graph, memory, claims, and evidence.

agent-brain agent-memory ai-agents document-ai electron evidence knowledge-graph llm-wiki local-first markdown ollama rust source-backed

Last synced: 01 Jun 2026

https://github.com/stanford-oval/sliders

Repository for paper: Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

agents document-ai long-context

Last synced: 03 Jun 2026

https://github.com/ionmich/batch-doc-vqa

Ask a question about a document collection and extract structured responses

document-ai docvqa llama local-llm ocr ollama

Last synced: 17 May 2026

https://github.com/sanikamal/gcp-ai-projects

Explore and implement powerful AI and Machine Learning solutions using Google Cloud Platform (GCP).

document-ai function-calling gemini imagens llm nlp ocr rag recommendation-system streamlit vertex-ai

Last synced: 02 May 2026

https://github.com/anshi312/financial-analyst-rag

Financial QA assistant using RAG, FAISS, LangChain & HuggingFace to query 10-Ks & reports. PDF search, NLP, and Streamlit UI for insights.

document-ai faiss financial-analysis huggingface langchain nlp pdf-processing rag semantic-search sentence-transformers streamlit

Last synced: 05 May 2026

https://github.com/ricardolsmendes/gcp-documentai-custom-extractors

Custom data extractors that use Google Cloud's Document AI

data-extraction document-ai gcp google-cloud machine-learning

Last synced: 02 Jan 2026

https://github.com/akincenk/insightai

AI-powered semantic PDF intelligence engine. Upload PDFs, ask natural questions, get meaningful answers from inside documents using vector search and OpenAI.

ai document-ai faiss fastapi openai pdf semantic-search uvicorn vector-search

Last synced: 08 May 2026

https://github.com/oleksiilatypov/google_cloud

AI & Data, Google Cloud Skills Boost

bigquery document-ai ml vertexai

Last synced: 23 Apr 2026

https://github.com/ozcanmiraay/opsbot

AI-powered PDF extraction suite for structured insights from contracts, forms, and documents. Built with Streamlit, LangChain, GPT-4o, and PDFPlumber.

automation contracts document-ai gpt-4o langchain openai pdf-extraction streamlit structured-data

Last synced: 13 Apr 2026

https://github.com/fangyuan025/hushdoc

Chat with your documents — privately, offline, on your own machine. Local-first RAG over PDFs/DOCX/images with GPU-accelerated streaming, optional voice mode, multi-conversation history, and citation-anchored sources. Bilingual (中/EN). FastAPI + React + llama.cpp.

bilingual chromadb document-ai fastapi llama-cpp llm local-llm offline-first pdf-chat privacy rag react typescript voice-assistant whisper

Last synced: 17 May 2026

https://github.com/yash-learnerr/pdf-chat-assistant

Build a powerful PDF Chat Assistant using Node.js, LangChain, and Google Gemini. Upload PDFs, extract content, and interact with them using natural language queries powered by Gemini LLM. Ideal for document Q&A, contract analysis, resume review, and more.

ai-chatbot ai-pdf-reader document-ai document-qa gemini-api google-gemini google-llm langchain langchain-nodejs llm-backend nodejs-llm openai-alternative pdf-chat-assistant pdf-chat-node pdf-chatbot pdf-question-answering rag semantic-search

Last synced: 14 Apr 2026

https://github.com/mj-deving/invoice-parse-agent

PDF/image invoices into structured JSON: OCR, LLM extraction, schema validation, low-confidence review queue, ground-truth eval. TypeScript, Tesseract, Claude.

document-ai llm ocr rag typescript

Last synced: 02 Jun 2026

https://github.com/gafnts/agentic-kie

Agentic and single-pass Key Information Extraction (KIE) from documents using LLMs

agentic-ai document-ai key-information-extraction kie langchain pydantic

Last synced: 11 Apr 2026

https://github.com/junotb/omniparse-ai-stack

Document & image parsing full-stack demo. OCR, VLM, document layout analysis, LLM Agent — runs in browser or on server. Next.js + FastAPI.

document-ai document-parsing easyocr fastapi full-stack gemini image-analysis langchain llm machine-learning nextjs ocr python tesseract typescript vision-language-model yolo

Last synced: 04 Apr 2026

https://github.com/atithi4dev/vectraq-server

End-to-end intelligent document Q&A system using LangChain, OpenAI, and vector databases. Upload PDFs and get GPT-powered answers instantly.

document-ai langc openai pdf-cahtbot vector-search

Last synced: 28 Apr 2026

https://github.com/gafnts/agentic-kie-evals

Benchmarking agentic and single-pass extraction strategies across LLM providers on the Kleister NDA dataset

agentic-ai agentic-kie document-ai evals key-information-extraction kie langsmith

Last synced: 30 Apr 2026