Projects in Awesome Lists tagged with document-understanding
A curated list of projects in awesome lists tagged with document-understanding .
https://github.com/infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
agent agents ai-search chatbot chatgpt deep-learning deepseek deepseek-r1 document-parser document-understanding genai graphrag llm nlp ollama pdf-to-text rag retrieval-augmented-generation table-structure-recognition text2sql
Last synced: 09 May 2026
https://github.com/deepdoctection/deepdoctection
A Repo For Document AI
document-ai document-image-analysis document-layout-analysis document-parser document-understanding layoutlm nlp ocr publaynet pubtabnet python pytorch table-detection table-recognition tensorflow
Last synced: 04 Jan 2026
https://github.com/x-plug/mplug-docowl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding
Last synced: 14 May 2025
https://github.com/alibabaresearch/advancedliteratemachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer
Last synced: 28 Mar 2025
https://github.com/AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer
Last synced: 10 Apr 2025
https://github.com/X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding
Last synced: 11 May 2025
https://github.com/OpenBMB/VisRAG
Parsing-free RAG supported by VLMs
document-retrieval document-understanding multi-modal multi-modality rag retrieval retrieval-augmented-generation vision-language-model
Last synced: 23 Jun 2026
https://github.com/openbmb/visrag
Parsing-free RAG supported by VLMs
document-retrieval document-understanding multi-modal multi-modality rag retrieval retrieval-augmented-generation vision-language-model
Last synced: 05 Oct 2025
https://github.com/wenwenyu/PICK-pytorch
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
document-analysis document-understanding graph-convolutional-network graph-learning graph-neural-networks key-information-extraction
Last synced: 28 Apr 2025
https://github.com/googlecloudplatform/document-ai-samples
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
document-understanding machine-learning ocr pdf python samples
Last synced: 16 May 2025
https://github.com/scut-dlvclab/document-ai-recommendations
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
document-ai document-understanding key-information-extraction table-structure-recognition visual-information-extraction
Last synced: 12 Feb 2026
https://github.com/huggingface/chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
computer-vision dataloading datasets distributed-training document-understanding multi-modal-learning pdf-document webdataset
Last synced: 14 Oct 2025
https://github.com/Alpha-Innovator/DocGenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
document-understanding paper-annotation question-answering
Last synced: 14 Sep 2025
https://github.com/alpha-innovator/docgenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
document-understanding paper-annotation question-answering
Last synced: 05 Mar 2026
https://github.com/doc-analysis/readingbank
ReadingBank: A Benchmark Dataset for Reading Order Detection
document-ai document-intelligence document-understanding natural-language-processing nlp ocr
Last synced: 04 Jan 2026
https://github.com/microsoft/comphrdoc
Datasets and Evaluation Scripts for CompHRDoc
document-structure-analysis document-understanding rag-related
Last synced: 02 May 2026
https://github.com/zeninglin/peneo
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
document-ai document-understanding key-information-extraction ocr visual-information-extraction
Last synced: 27 Mar 2025
https://github.com/scut-dlvclab/rfund
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
document-ai document-understanding key-information-extraction ocr visual-information-extraction
Last synced: 07 Feb 2026
https://nextplusplus.github.io/TAT-DQA/
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
document-understanding question-answering vqa
Last synced: 27 Oct 2025
https://github.com/uakarsh/tilt-implementation
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
deep-learning document-understanding pytorch-implementation pytorch-lightning transformers
Last synced: 27 Feb 2025
https://github.com/jacobmarks/pytesseract-ocr-plugin
Run optical character recognition with PyTesseract from the FiftyOne App!
computer-vision document-understanding fiftyone nlp ocr plugin python tesseract tesseract-ocr
Last synced: 31 Oct 2025
https://github.com/bwnyasse/dart-documentai-samples
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
dart dartlang document-ai document-understanding google-cloud machine-learning samples
Last synced: 21 Jun 2025
https://github.com/extrievetechnologies/quickcapture_ios
QuickCapture Mobile Scanning SDK Specially designed for native IOS
document-classification document-scanner-app document-scanning-sdk document-understanding ios objective-c swift
Last synced: 15 May 2026
https://github.com/extrievetechnologies/quickcapture_android
QuickCapture Mobile Scanning SDK Specially designed for native ANDROID from Extrieve
android document-scanner document-scanner-app document-scanning-sdk document-understanding java kotllin
Last synced: 16 Apr 2026
https://github.com/yuvaraj3855/preocr
Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.
computer-vision document-analysis document-classification document-intelligence document-processing document-understanding file-analysis image-processing layout-analysis ocr ocr-detection opencv pdf pdf-analysis pdf-parsing preprocessing python python-library text-detection text-extraction
Last synced: 16 Feb 2026
https://github.com/mycielski/textract_study
Analysing expense reports/invoices with AWS Textract and boto3.
aws aws-cli boto3 document-understanding expenses invoices script shell textract
Last synced: 06 May 2026
https://github.com/Dangocan/comfyui_glm_ocr
ComfyUI custom node to run GLM-OCR locally — text, formula, and table recognition from images
comfyui comfyui-node computer-vision document-understanding glm-ocr huggingface ocr transformers
Last synced: 18 Mar 2026
https://github.com/docling-project/docling4j
Docling4j brings the functionalities of Docling in document understanding to Java® projects
ai docling document-parser document-parsing document-understanding documents java pdf pdf-converter pdf-to-json
Last synced: 15 Jun 2025
https://github.com/ryanlinjui/menu-text-detection
Extract structured menu information from images into JSON by E2E Vision-Language model fine-tuning pipeline or LLM.
document-understanding donut fine-tuning image-text-to-text transformer
Last synced: 20 Apr 2026
https://github.com/callbacked/smoldocling256m-webgpu
Document Understanding in the Browser!
ai document-understanding llms transformersjs
Last synced: 14 Jul 2025