An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-intelligence

A curated list of projects in awesome lists tagged with document-intelligence .

https://github.com/PaddlePaddle/PaddleNLP

πŸ‘‘ Easy-to-use and powerful NLP and LLM library with πŸ€— Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πŸ—‚Text Classification, πŸ” Neural Search, ❓ Question Answering, ℹ️ Information Extraction, πŸ“„ Document Intelligence, πŸ’Œ Sentiment Analysis etc.

bert compression distributed-training document-intelligence embedding ernie information-extraction llama llm neural-search nlp paddlenlp pretrained-models question-answering search-engine semantic-analysis sentiment-analysis transformers uie

Last synced: 18 Mar 2025

https://github.com/kreuzberg-dev/kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

document-intelligence elixir ffi golang java metadata-extraction node pdf-extraction pdfium php python rag ruby rust table-extraction tesseract text-extraction wasm

Last synced: 07 Feb 2026

https://github.com/Goldziher/kreuzberg

Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

async document-intelligence mcp metadata-extraction ocr pandoc pdf-extraction pdfium python rag table-extraction tesseract text-extraction

Last synced: 21 Oct 2025

https://github.com/enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

ai document-image-analysis document-intelligence document-parsing document-processing langchain llm machine-learning nlp ocr openai pdf pdf-to-text python

Last synced: 04 Apr 2025

https://github.com/enoch3712/extractthinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

ai document-image-analysis document-intelligence document-parsing document-processing langchain llm machine-learning nlp ocr openai pdf pdf-to-text python

Last synced: 14 May 2025

https://github.com/azure/ai-in-a-box

AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.

ai azd azd-templates azure chat-bot chatbot chatgpt custom-vision document-intelligence edge-ai edge-computing langchain machine-learning openai semantic-kernel

Last synced: 14 Apr 2025

https://github.com/Azure/AI-in-a-Box

AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.

ai azd azd-templates azure chat-bot chatbot chatgpt custom-vision document-intelligence edge-ai edge-computing langchain machine-learning openai semantic-kernel

Last synced: 25 Mar 2025

https://github.com/doc-analysis/readingbank

ReadingBank: A Benchmark Dataset for Reading Order Detection

document-ai document-intelligence document-understanding natural-language-processing nlp ocr

Last synced: 04 Jan 2026

https://github.com/jamesmcroft/azure-document-intelligence-markdown-to-openai-data-extraction-sample

This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.

azure document-intelligence gpt openai

Last synced: 26 Oct 2025

https://github.com/jamesmcroft/document-intelligence-user-feedback-processor

An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop

ai azure document-intelligence mlops

Last synced: 14 Jun 2025

https://github.com/joinalahmed/invoiceparsingwithaoai

Using Azure Document Intelligence and Azure OpenAI services to automatically extract data from invoices.

aoai azure document-intelligence invoice-parser

Last synced: 20 Aug 2025

https://github.com/jasjeev013/neuroquery-chroma-rag

NeuroQuery is an AI-powered PDF question-answering system that lets you upload and interact with documents using natural language. Built with LangChain, Gemini AI, and Chroma, it delivers fast, context-aware answers from your files.

ai chromadb document-intelligence gemini langchain multi-pdf-processing nlp pdf-analysis-python pdf-question-answering streamlit vector-search

Last synced: 01 Aug 2025

https://github.com/msaleh1888/azure-serverless-invoice-extraction

Serverless invoice extraction API using Azure Document Intelligence and Azure Functions. Upload a PDF invoice and receive normalized JSON output including line items, totals, dates, and vendor details.

ai-engineering architecture azure-ai azure-document-intelligence azure-functions backend cloud-engineering cloud-functions document-intelligence form-recognizer http-trigger invoice-processing microservice ocr pdf-processing pdf-to-json python rest-api serverless serverless-architecture

Last synced: 13 Jan 2026