An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-parser

A curated list of projects in awesome lists tagged with document-parser .

https://github.com/marker-inc-korea/autorag

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

analysis automl benchmarking document-parser embeddings evaluation llm llm-evaluation llm-ops open-source ops optimization pipeline python qa rag rag-evaluation retrieval-augmented-generation

Last synced: 12 May 2025

https://github.com/iamarunbrahma/vision-parse

Parse PDFs into markdown using Vision LLMs

document-parser pdf-parser pdf-to-markdown text-extraction

Last synced: 13 Dec 2025

https://github.com/lianjiatech/bella-domify

文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。

document-parser parser pdf-parser

Last synced: 04 Oct 2025

https://github.com/decisionfacts/semantic-ai

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

approximate-nearest-neighbor-search deep-neural-networks document-parser docx fastapi inference-api llama2 llm machine-learning ocr openai openai-api pdf rag retrieval-augmented-generation semantic-search vector-database

Last synced: 27 Jul 2025

https://github.com/urbanclap-engg/smart-docs-parser

An OCR based document parser to extract information from identity document images

aadhaar auto-fill document-parser google-vision nodejs ocr pancard typescript user-onboarding

Last synced: 05 Oct 2025

https://github.com/has-abi/docparser

Extract text from your DOCX documents.

doc-parser document-parser docx-parser text-parser

Last synced: 14 Dec 2025

https://github.com/hrbrmstr/docparser

🧰 Tools to Upload/Parse Documents to 'docparser' and Retrieve Extracted Results

docparser document-parser r rstats

Last synced: 29 Oct 2025

https://github.com/gyanvir/drparser

Dr.Parser 🩸📊 – AI-powered blood report parser that extracts and analyzes medical data from images/PDFs. Built with React, FastAPI, EasyOCR, and Gemini AI. 🚀 🔹 Local Setup Available | 🔹 Future Enhancements Planned | 🔹 Hackathon Project 👉 Clone, run, and explore the future of AI-driven healthcare!

ai-ml blood-report-analysis document-parser easyocr fastapi hackathon-project healthcare medical-ai ocr reactjs team-euphoria

Last synced: 06 Oct 2025

https://github.com/vetrivel07/ai-powered-resume-evaluator

An AI-powered resume evaluation app that compares a candidate’s resume with a job description using Google’s Gemini 1.5 Flash model to provide HR-style feedback and an ATS-style match scoring through a simple and interactive Streamlit interface.

ats document-parser evaluator gemini-api gemini-flash genai python-library resume-analysis streamlit streamlit-application

Last synced: 01 Sep 2025

https://github.com/coderosh/docpa

A simple library that I use for web scraping. Uses htmlparser2 to parse dom.

docpa document-parser dom html-parser

Last synced: 12 Jul 2025

https://github.com/cr4yfish/docling-js

Parsing Documents to one datatype (Typescript port of Docling)

document-parser document-parsing genai pdf-converter pdf-to-text

Last synced: 31 Aug 2025

https://github.com/connectaman/deepseek-ocr-multigpu-infer

Efficient multi-GPU OCR inference framework leveraging parallel processes for accelerated token throughput and faster batch processing. Designed for scalable, high-performance optical character recognition workloads using PyTorch. Supports dynamic GPU assignment, optimized resource utilization, and easy integration for large-scale image datasets.

agentic-extraction data deepseek document-parser extraction extractor gpu image-parser llm multigpu nvidia ocr parallel-computing parser pdf-parser vlm

Last synced: 22 Jan 2026

https://github.com/revankumard/llamarker

Your ultimate tool for effortlessly converting and parsing documents into clean, well-structured Markdown—fast, reliable, and 100% local! 💻✨

document-parser llama-ai llamarker local-parsing-tool marker

Last synced: 22 Mar 2025

https://github.com/docling-project/docling4j

Docling4j brings the functionalities of Docling in document understanding to Java® projects

ai docling document-parser document-parsing document-understanding documents java pdf pdf-converter pdf-to-json

Last synced: 15 Jun 2025

https://github.com/shijincai/fast360

The industry's first "Open Source OCR Arena," a free, no-login utility for one-click benchmarking of 7 top-tier models (Marker, MinerU, MonkeyOCR, Docling, Dolphin, OCRFlux, PP-StructureV3) on your PDF/image files, specializing in PDF-to-Markdown conversion.

benchmark computer-vision data-extraction docling document-analysis document-parser evaluation latex latex-document machine-learning markdown-converter marker monkeyocr ocr ocr-service paddleocr pdf-converter pdf-to-markdown rag

Last synced: 30 Aug 2025

https://github.com/buren/document_parser

Small Rails API app to parse documents.

document-parser rails-api yomu

Last synced: 31 Aug 2025

https://github.com/midhunterx/scholar-cap

🎓 Set of powerful tools designed to streamline the extraction, parsing, and clean-up of data from docx and pdf forms. Saves time and eliminate manual data entry by automating the processing of structured data.

bank-details-validtion bulk-neft-generator dbms document-parser form-management multithreading

Last synced: 05 Mar 2025

https://github.com/setiaafandi/anyparser_crewai

Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

anyparser cache-augmented-generation cag crew-ai crew-ai-rag crewai-rag document-parser document-parsing kag knowledge-graph python rag retrieval-augmented-generation typescript

Last synced: 07 Mar 2025

https://github.com/dills122/shamwow

Who likes lawyers? Me either; scrub your PII with ShamWow

attributes document-parser document-scrubber pii poco reflection scrub scrubber verify

Last synced: 31 Mar 2025

https://github.com/akandindajunior/cloud-services

If it’s not documented, it never happened. 📝 Please check my README.md for more details. 🔍

alibaba aws-amplify azure cats-over-dogs docker document document-parser google hacktoberfest java parsing pdf pdf-to-text rocket-ships

Last synced: 23 Jul 2025

https://github.com/anyparser/anyparser_crewai

Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

anyparser artificial-intelligence cache-augmented-generation cag crew-ai crew-ai-rag crewai crewai-rag document-parser document-parsing kag knowledge-graph python rag retrieval-augmented-generation typescript

Last synced: 04 Oct 2025