An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-understanding

A curated list of projects in awesome lists tagged with document-understanding .

https://github.com/x-plug/mplug-docowl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Last synced: 14 May 2025

https://github.com/X-PLUG/mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Last synced: 11 May 2025

https://github.com/wenwenyu/PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

document-analysis document-understanding graph-convolutional-network graph-learning graph-neural-networks key-information-extraction

Last synced: 28 Apr 2025

https://github.com/googlecloudplatform/document-ai-samples

Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud

document-understanding machine-learning ocr pdf python samples

Last synced: 16 May 2025

https://github.com/scut-dlvclab/document-ai-recommendations

Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.

document-ai document-understanding key-information-extraction table-structure-recognition visual-information-extraction

Last synced: 12 Feb 2026

https://github.com/huggingface/chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

computer-vision dataloading datasets distributed-training document-understanding multi-modal-learning pdf-document webdataset

Last synced: 14 Oct 2025

https://github.com/Alpha-Innovator/DocGenome

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models

document-understanding paper-annotation question-answering

Last synced: 14 Sep 2025

https://github.com/alpha-innovator/docgenome

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models

document-understanding paper-annotation question-answering

Last synced: 05 Mar 2026

https://github.com/doc-analysis/readingbank

ReadingBank: A Benchmark Dataset for Reading Order Detection

document-ai document-intelligence document-understanding natural-language-processing nlp ocr

Last synced: 04 Jan 2026

https://github.com/microsoft/comphrdoc

Datasets and Evaluation Scripts for CompHRDoc

document-structure-analysis document-understanding rag-related

Last synced: 02 May 2026

https://github.com/zeninglin/peneo

[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.

document-ai document-understanding key-information-extraction ocr visual-information-extraction

Last synced: 27 Mar 2025

https://github.com/scut-dlvclab/rfund

[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"

document-ai document-understanding key-information-extraction ocr visual-information-extraction

Last synced: 07 Feb 2026

https://nextplusplus.github.io/TAT-DQA/

TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning

document-understanding question-answering vqa

Last synced: 27 Oct 2025

https://github.com/uakarsh/tilt-implementation

Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.

deep-learning document-understanding pytorch-implementation pytorch-lightning transformers

Last synced: 27 Feb 2025

https://github.com/jacobmarks/pytesseract-ocr-plugin

Run optical character recognition with PyTesseract from the FiftyOne App!

computer-vision document-understanding fiftyone nlp ocr plugin python tesseract tesseract-ocr

Last synced: 31 Oct 2025

https://github.com/bwnyasse/dart-documentai-samples

A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.

dart dartlang document-ai document-understanding google-cloud machine-learning samples

Last synced: 21 Jun 2025

https://github.com/extrievetechnologies/quickcapture_android

QuickCapture Mobile Scanning SDK Specially designed for native ANDROID from Extrieve

android document-scanner document-scanner-app document-scanning-sdk document-understanding java kotllin

Last synced: 16 Apr 2026

https://github.com/mycielski/textract_study

Analysing expense reports/invoices with AWS Textract and boto3.

aws aws-cli boto3 document-understanding expenses invoices script shell textract

Last synced: 06 May 2026

https://github.com/Dangocan/comfyui_glm_ocr

ComfyUI custom node to run GLM-OCR locally — text, formula, and table recognition from images

comfyui comfyui-node computer-vision document-understanding glm-ocr huggingface ocr transformers

Last synced: 18 Mar 2026

https://github.com/docling-project/docling4j

Docling4j brings the functionalities of Docling in document understanding to Java® projects

ai docling document-parser document-parsing document-understanding documents java pdf pdf-converter pdf-to-json

Last synced: 15 Jun 2025

https://github.com/ryanlinjui/menu-text-detection

Extract structured menu information from images into JSON by E2E Vision-Language model fine-tuning pipeline or LLM.

document-understanding donut fine-tuning image-text-to-text transformer

Last synced: 20 Apr 2026

https://github.com/callbacked/smoldocling256m-webgpu

Document Understanding in the Browser!

ai document-understanding llms transformersjs

Last synced: 14 Jul 2025