Projects in Awesome Lists tagged with document-ai
A curated list of projects in awesome lists tagged with document-ai .
https://github.com/microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
beit beit-3 bitnet deepnet document-ai foundation-models kosmos kosmos-1 layoutlm layoutxlm llm minilm mllm multimodal nlp pre-trained-model textdiffuser trocr unilm xlm-e
Last synced: 13 May 2025
https://github.com/clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
computer-vision document-ai eccv-2022 multimodal-pre-trained-model nlp ocr
Last synced: 13 May 2025
https://github.com/deepdoctection/deepdoctection
A Repo For Document AI
document-ai document-image-analysis document-layout-analysis document-parser document-understanding layoutlm nlp ocr publaynet pubtabnet python pytorch table-detection table-recognition tensorflow
Last synced: 04 Jan 2026
https://github.com/wxyhgk/retain-pdf
在保留版面、公式与结构的前提下进行 PDF 翻译,适用于科研与技术文档
document-ai document-processing layout-preserving ocr pdf scientific-papers translation typst
Last synced: 31 May 2026
https://github.com/scut-dlvclab/document-ai-recommendations
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
document-ai document-understanding key-information-extraction table-structure-recognition visual-information-extraction
Last synced: 12 Feb 2026
https://github.com/clovaai/webvicob
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
Last synced: 06 Oct 2025
https://github.com/doc-analysis/readingbank
ReadingBank: A Benchmark Dataset for Reading Order Detection
document-ai document-intelligence document-understanding natural-language-processing nlp ocr
Last synced: 04 Jan 2026
https://github.com/zeninglin/vibertgrid-pytorch
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
document-ai document-analysis information-extraction key-information-extraction visual-information-extraction
Last synced: 07 Oct 2025
https://github.com/whn09/table_structure_recognition
Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.
document-ai ocr table table-detection table-structure-recognition yolov5 yolov8
Last synced: 02 Mar 2026
https://github.com/googleapis/python-documentai-toolbox
Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
ai document-ai gcp generative-ai google-cloud google-cloud-platform vertex-ai
Last synced: 22 Mar 2025
https://github.com/zeninglin/peneo
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
document-ai document-understanding key-information-extraction ocr visual-information-extraction
Last synced: 27 Mar 2025
https://github.com/scut-dlvclab/rfund
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
document-ai document-understanding key-information-extraction ocr visual-information-extraction
Last synced: 07 Feb 2026
https://github.com/yapit-tts/yapit
Listen to anything. TTS for documents, papers, and web pages.
document-ai document-reader fastapi gemini gemini-api markdown-converter markdown-viewer pdf-document-processor react self-hosted text-to-speech tts tts-gui yolo
Last synced: 04 Apr 2026
https://github.com/LexStack-AI/LexReviewer
LexReviewer is an AI legal document review engine that ingests PDFs, uses RAG for hybrid retrieval, and enables grounded chat with citations and bounding-box references, delivering verifiable answers directly from the source text.
citation contract-analysis contract-review document-ai legal-ai legaltech rag
Last synced: 31 Mar 2026
https://github.com/wintermi/ocr-runner
OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.
cloud-vision cloud-vision-api document-ai google-cloud google-cloud-platform
Last synced: 05 May 2025
https://github.com/komalharshita/pocketpilot-ai
PocketPilot AI is a smart personal finance assistant that turns everyday receipts into real-time spending insights using Gemini API.
document-ai firebase-auth gemini-api hackathon-project
Last synced: 07 Mar 2026
https://github.com/bwnyasse/dart-documentai-samples
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
dart dartlang document-ai document-understanding google-cloud machine-learning samples
Last synced: 21 Jun 2025
https://github.com/nanonets/nanoindex
Turn PDFs into tree and graph-based RAG with Karpathy-inspired knowledge bases. Cited answers with page coordinates. Self-correcting extraction. No vectors needed.
citations document-ai graph indexing knowledge-graph llm nanonets ocr pdf question-answering rag rag-pipeline vectorless-rag vlm vlm-ocr
Last synced: 10 Apr 2026
https://github.com/qyinm/hyprduck
Local-first brain repo and trust console for AI agents. Turns local documents into source-backed wiki, graph, memory, claims, and evidence.
agent-brain agent-memory ai-agents document-ai electron evidence knowledge-graph llm-wiki local-first markdown ollama rust source-backed
Last synced: 01 Jun 2026
https://github.com/stanford-oval/sliders
Repository for paper: Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
agents document-ai long-context
Last synced: 03 Jun 2026
https://github.com/ionmich/batch-doc-vqa
Ask a question about a document collection and extract structured responses
document-ai docvqa llama local-llm ocr ollama
Last synced: 17 May 2026
https://github.com/sanikamal/gcp-ai-projects
Explore and implement powerful AI and Machine Learning solutions using Google Cloud Platform (GCP).
document-ai function-calling gemini imagens llm nlp ocr rag recommendation-system streamlit vertex-ai
Last synced: 02 May 2026
https://github.com/anshi312/financial-analyst-rag
Financial QA assistant using RAG, FAISS, LangChain & HuggingFace to query 10-Ks & reports. PDF search, NLP, and Streamlit UI for insights.
document-ai faiss financial-analysis huggingface langchain nlp pdf-processing rag semantic-search sentence-transformers streamlit
Last synced: 05 May 2026
https://github.com/ricardolsmendes/gcp-documentai-custom-extractors
Custom data extractors that use Google Cloud's Document AI
data-extraction document-ai gcp google-cloud machine-learning
Last synced: 02 Jan 2026
https://github.com/akincenk/insightai
AI-powered semantic PDF intelligence engine. Upload PDFs, ask natural questions, get meaningful answers from inside documents using vector search and OpenAI.
ai document-ai faiss fastapi openai pdf semantic-search uvicorn vector-search
Last synced: 08 May 2026
https://github.com/oleksiilatypov/google_cloud
AI & Data, Google Cloud Skills Boost
bigquery document-ai ml vertexai
Last synced: 23 Apr 2026
https://github.com/marcusmonteirodesouza/google-cloud-document-ai-rest-api-demo
Create an Identity Auto-Filler API with Google Cloud Document AI
document-ai express google-cloud google-cloud-platform nextjs nodejs terraform
Last synced: 17 Jan 2026
https://github.com/ozcanmiraay/opsbot
AI-powered PDF extraction suite for structured insights from contracts, forms, and documents. Built with Streamlit, LangChain, GPT-4o, and PDFPlumber.
automation contracts document-ai gpt-4o langchain openai pdf-extraction streamlit structured-data
Last synced: 13 Apr 2026
https://github.com/fangyuan025/hushdoc
Chat with your documents — privately, offline, on your own machine. Local-first RAG over PDFs/DOCX/images with GPU-accelerated streaming, optional voice mode, multi-conversation history, and citation-anchored sources. Bilingual (中/EN). FastAPI + React + llama.cpp.
bilingual chromadb document-ai fastapi llama-cpp llm local-llm offline-first pdf-chat privacy rag react typescript voice-assistant whisper
Last synced: 17 May 2026
https://github.com/yash-learnerr/pdf-chat-assistant
Build a powerful PDF Chat Assistant using Node.js, LangChain, and Google Gemini. Upload PDFs, extract content, and interact with them using natural language queries powered by Gemini LLM. Ideal for document Q&A, contract analysis, resume review, and more.
ai-chatbot ai-pdf-reader document-ai document-qa gemini-api google-gemini google-llm langchain langchain-nodejs llm-backend nodejs-llm openai-alternative pdf-chat-assistant pdf-chat-node pdf-chatbot pdf-question-answering rag semantic-search
Last synced: 14 Apr 2026
https://github.com/mj-deving/invoice-parse-agent
PDF/image invoices into structured JSON: OCR, LLM extraction, schema validation, low-confidence review queue, ground-truth eval. TypeScript, Tesseract, Claude.
document-ai llm ocr rag typescript
Last synced: 02 Jun 2026
https://github.com/gafnts/agentic-kie
Agentic and single-pass Key Information Extraction (KIE) from documents using LLMs
agentic-ai document-ai key-information-extraction kie langchain pydantic
Last synced: 11 Apr 2026
https://github.com/junotb/omniparse-ai-stack
Document & image parsing full-stack demo. OCR, VLM, document layout analysis, LLM Agent — runs in browser or on server. Next.js + FastAPI.
document-ai document-parsing easyocr fastapi full-stack gemini image-analysis langchain llm machine-learning nextjs ocr python tesseract typescript vision-language-model yolo
Last synced: 04 Apr 2026
https://github.com/atithi4dev/vectraq-server
End-to-end intelligent document Q&A system using LangChain, OpenAI, and vector databases. Upload PDFs and get GPT-powered answers instantly.
document-ai langc openai pdf-cahtbot vector-search
Last synced: 28 Apr 2026
https://github.com/gafnts/agentic-kie-evals
Benchmarking agentic and single-pass extraction strategies across LLM providers on the Kleister NDA dataset
agentic-ai agentic-kie document-ai evals key-information-extraction kie langsmith
Last synced: 30 Apr 2026