Projects in Awesome Lists tagged with document-parsing
A curated list of projects in awesome lists tagged with document-parsing .
https://docling-project.github.io/docling/
Get your documents ready for gen AI
ai convert document-parser document-parsing documents docx html markdown pdf pdf-converter pdf-to-json pdf-to-text pptx tables xlsx
Last synced: 26 Jun 2025
https://github.com/docling-project/docling
Get your documents ready for gen AI
ai convert document-parser document-parsing documents docx html markdown pdf pdf-converter pdf-to-json pdf-to-text pptx tables xlsx
Last synced: 09 Sep 2025
https://github.com/ds4sd/docling
Get your documents ready for gen AI
ai convert document-parser document-parsing documents docx html markdown pdf pdf-converter pdf-to-json pdf-to-text pptx tables xlsx
Last synced: 08 Mar 2025
https://ds4sd.github.io/docling/
Get your documents ready for gen AI
ai convert document-parser document-parsing documents docx html markdown pdf pdf-converter pdf-to-json pdf-to-text pptx tables xlsx
Last synced: 07 Sep 2025
https://github.com/unstructured-io/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 09 Sep 2025
https://github.com/Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 26 Mar 2025
https://github.com/run-llama/llama_cloud_services
Knowledge Agents and Management in the Cloud
document document-parser document-parsing docx-to-markdown parsing pdf pdf-document-processor pdf-to-excel pdf-to-json pdf-to-markdown pdf-to-text ppt-to-json ppt-to-markdown pptx structured-data tables
Last synced: 15 May 2025
https://github.com/enoch3712/ExtractThinker
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
ai document-image-analysis document-intelligence document-parsing document-processing langchain llm machine-learning nlp ocr openai pdf pdf-to-text python
Last synced: 04 Apr 2025
https://github.com/enoch3712/extractthinker
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
ai document-image-analysis document-intelligence document-parsing document-processing langchain llm machine-learning nlp ocr openai pdf pdf-to-text python
Last synced: 14 May 2025
https://github.com/edenai/edenai-apis
Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines
aggregator ai ai-as-a-service api computer-vision document-parsing image-processing machine-translation natural-language-processing nlp ocr optical-character-recognition pre-trained-model python speech-recognition speech-to-text text-to-speech video-recognition
Last synced: 07 Apr 2025
https://github.com/harishdeivanayagam/rowfill
Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers
document document-extraction document-parsing image-ocr langgraph llama llm nextjs ocr ocr-javascript ollama openai pdf pdfs unstructured unstructured-data vision vision-api
Last synced: 13 Apr 2025
https://github.com/j-sephb-lt-n/pdf-bank-statement-parser
Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data
bank banking document-parsing financial-analysis first-national-bank fnb pdf-parser pdf-parsing python
Last synced: 31 Aug 2025
https://github.com/ziming/laravel-docparser
Docparser OCR Package for PHP Laravel
doc-parser docparser document-parsing laravel ocr php
Last synced: 04 May 2025
https://github.com/cr4yfish/docling-js
Parsing Documents to one datatype (Typescript port of Docling)
document-parser document-parsing genai pdf-converter pdf-to-text
Last synced: 31 Aug 2025
https://github.com/baughmann/tikara
The metadata and text content extractor for almost every file type.
apache-tika content-extraction document-parsing document-processing docx image-to-text java language-detection llm metadata metadata-extraction ml natural-language-processing ocr pdf-to-text retrieval-augmented-generation text-extraction text-mining
Last synced: 03 Oct 2025
https://github.com/setiaafandi/anyparser_crewai
Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.
anyparser cache-augmented-generation cag crew-ai crew-ai-rag crewai-rag document-parser document-parsing kag knowledge-graph python rag retrieval-augmented-generation typescript
Last synced: 07 Mar 2025
https://github.com/docling-project/docling4j
Docling4j brings the functionalities of Docling in document understanding to Java® projects
ai docling document-parser document-parsing document-understanding documents java pdf pdf-converter pdf-to-json
Last synced: 15 Jun 2025
https://github.com/anyparser/anyparser_crewai
Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.
anyparser artificial-intelligence cache-augmented-generation cag crew-ai crew-ai-rag crewai crewai-rag document-parser document-parsing kag knowledge-graph python rag retrieval-augmented-generation typescript
Last synced: 04 Oct 2025
https://github.com/imnotamr/english-to-french-app-using-streamlit-
An interactive Streamlit app that translates English text and documents to French, featuring Google Translate API integration and text-to-speech functionality. Includes PDF and Word document translation.
ai-projects cloud-deployment deep-learning document-parsing machine-translation nlp openai-projects python-tools speech-synthesis streamlit streamlit-application streamlit-cloud streamlit-webapp text-analysis text-to-speech translation voice-output
Last synced: 02 Apr 2025