Projects in Awesome Lists tagged with document-extraction
A curated list of projects in awesome lists tagged with document-extraction .
https://github.com/harishdeivanayagam/rowfill
Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers
document document-extraction document-parsing image-ocr langgraph llama llm nextjs ocr ocr-javascript ollama openai pdf pdfs unstructured unstructured-data vision vision-api
Last synced: 13 Apr 2025
https://github.com/alephdata/ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
document-extraction documents email-forensics excel forensics forensics-investigations metadata-extraction ocr
Last synced: 07 May 2025
https://github.com/konfuzio-ai/konfuzio-sdk
Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision models tailored for your specific use cases. Find examples with code in our Tutorials section of dev.konfuzio.com and get inspiration from Use Cases section of our blog: https://konfuzio.com/en/category/marketplace
computer-vision document-annotate document-annotation document-annotation-tool document-extraction nlp ocr python text-classification text-processing
Last synced: 08 Aug 2025
https://github.com/xyntopia/pydoxtools
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python
Last synced: 11 May 2025
https://github.com/tammilore/ai-contract-analyzer
AI-powered contract analysis tool
ai document-extraction llms open-source
Last synced: 17 Jul 2025
https://github.com/jamesmcroft/ai-document-data-extraction-evaluation
This project demonstrates how to evaluate the use of LLMs and SLMs for extracting structured data from documents using .NET
azure document-extraction gpt llms openai phi slms
Last synced: 28 Oct 2025
https://github.com/jamesmcroft/document-data-extraction-prompt-flow-evaluation
This sample demonstrates how to use GPT-4o with Vision to extract structured JSON data from PDF documents and evaluate them with Azure AI Studio and Prompt Flow
azure document-extraction evaluation gpt-4o llms openai prompt-flow
Last synced: 07 Aug 2025
https://github.com/jamesmcroft/azure-ai-document-pipeline-python-sample
Python sample project for building scalable document data extraction pipeline with containerized Durable Functions and Azure AI Services on Azure Container Apps.
ai-services azure container-apps document-extraction durable-functions gpt-4o openai
Last synced: 28 Oct 2025
https://github.com/dashroshan/data-extractor
Extract and download key-value pairs, tables, and paragraphs from your scanned pdf, jpg, and png documents as CSV files.
document-extraction form-analysis key-value-pairs ocr-python table-extraction
Last synced: 06 Apr 2025
https://github.com/jamesmcroft/azure-ai-document-pipeline-sample
.NET sample project for building a scalable document data extraction pipeline with containerized Durable Functions and Azure AI Services on Azure Container Apps.
ai-services azure container-apps document-extraction durable-functions gpt-4o openai
Last synced: 17 Jul 2025
https://github.com/ilejuxepwaduzd/structured-data-extractor
🛠️ Extract structured data from messy texts using Chain-of-Thought prompting to improve processing of customer support and technical issues.
cdp chrome-fetcher data document-extraction ecommerce golang-library headless metadata-extraction ocr open-source pdf pdf-converter pdf-extractor ruby scraper shopify spider structured-data
Last synced: 09 Oct 2025
https://github.com/subratamondal1/document-extraction
Document extraction from pdfs and images with OpenCV.
computer-vision document-extraction image-processing opencv py python3 pytorch
Last synced: 24 Jan 2026
https://github.com/hreikin/pdf-toolbox
Extract content from PDF's and convert or create new documents from the content in multiple output formats.
adobe document-conversion document-converter document-creation document-creator document-extraction image-extraction pandoc pymupdf pypandoc python python3 scrapy text-extraction
Last synced: 09 Jul 2025
https://github.com/sensible-hq/tutorial-pdf-to-excel
Converts a PDF file to Excel.
document-extraction excel extraction pdf python
Last synced: 03 Apr 2025
https://github.com/pmthetechguy/document-entity-extractor
AI-powered document extractor for names, emails, and organizations.
ai automation data-extraction document-extraction entity-recognition fastapi gpt openai pandas portfolio-project python uvicorn web-app
Last synced: 30 Apr 2025