An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pdf-parser

A curated list of projects in awesome lists tagged with pdf-parser .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 06 Jan 2026

https://github.com/opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 24 Mar 2025

https://github.com/py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

help-wanted pdf pdf-documents pdf-manipulation pdf-parser pdf-parsing pypdf2 python

Last synced: 11 Dec 2025

https://github.com/py-pdf/PyPDF2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

help-wanted pdf pdf-documents pdf-manipulation pdf-parser pdf-parsing pypdf2 python

Last synced: 17 Aug 2025

https://github.com/mstamy2/PyPDF2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

help-wanted pdf pdf-documents pdf-manipulation pdf-parser pdf-parsing pypdf2 python

Last synced: 02 Apr 2025

https://github.com/dromara/yft-design

基于fabric.js的开源版【稿定设计】。一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。A beautiful and powerful online design tool

canvas-editor clipper element-plus fabric-editor fabricjs image-crop online-design online-editor pdf-editor pdf-parser poster-design psd-editor psd-parse text2path vue3-fabric

Last synced: 15 May 2025

https://github.com/adithya-s-k/marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

api fastapi marker pdf-converter pdf-files pdf-parser pdf-parsing rest-api

Last synced: 16 May 2025

https://github.com/drmingler/docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.

api fastapi markdown-parser pdf-chatbot pdf-conversion pdf-converter pdf-parser pdf-parsing pdf-to-markdown

Last synced: 06 Oct 2025

https://github.com/iamarunbrahma/vision-parse

Parse PDFs into markdown using Vision LLMs

document-parser pdf-parser pdf-to-markdown text-extraction

Last synced: 13 Dec 2025

https://github.com/titipata/scipdf_parser

Python PDF parser for scientific publications: content and figures

grobid parser pdf pdf-parser python-parser scipdf-parser

Last synced: 16 May 2025

https://github.com/lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

chunking document-analysis llm nlp ocr pdf-parser pdfparser rag text-chunking

Last synced: 01 Apr 2025

https://github.com/ispras/dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

doc document-analysis document-content-extraction documents docx docx-parser excel html html-parser logical-structure-extraction ocr odt pdf pdf-parser scanned-documents table-of-contents table-recognition txt

Last synced: 15 May 2025

https://github.com/drmingler/smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

chatbot chunking claude gemini langchain llama-index markdown openai pdf-converter pdf-parser pdf-to-markdown rag

Last synced: 31 Jul 2025

https://github.com/genbs/poste-italiane-parser

A Python tool to parse PDF statements from Poste Italiane (Postepay, BancoPosta) and extract data as structured JSON.

bancoposta fintech pdf-parser personal-finance poste-italiane postepay

Last synced: 31 Oct 2025

https://github.com/ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

cython pdf pdf-converter pdf-parser pdfparser pdftohtml pdftopng pdftotext python xpdf xpdf-reader

Last synced: 13 Jul 2025

https://github.com/SimpleApp/PDFParser

Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser

pdf-parser swift truetype

Last synced: 21 Jul 2025

https://github.com/lianjiatech/bella-domify

文档解析(Document Parser),支持 PDF、TXT、DOC、DOCX、Markdown 等文件格式,高效提取与解析内容,生成标准文档树结构。内置 PDF Parser、Text Parser、Word Parser,助力 RAG、知识库、全文检索等智能应用。

document-parser parser pdf-parser

Last synced: 04 Oct 2025

https://github.com/sylphxltd/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

ai-agent llm-tool mcp model-content-protocol nodejs pdf pdf-parse pdf-parser pdf-reader stdio typescript

Last synced: 17 Jun 2025

https://github.com/tarfin-labs/easy-pdf

Pdf wrapper for laravel

laravel pdf pdf-merge pdf-parser php tcpdf

Last synced: 17 Mar 2025

https://github.com/adrienjoly/hsbcstatementparser

Transforms PDF bank statements from HSBC into a list of operations in JSON or TSV format.

bank-statement conversion csv-export json-export pdf-converter pdf-parser tsv-format

Last synced: 13 Jul 2025

https://github.com/nlitsme/pypdfcrack

Investigation in PDF encryption

file-format pdf-encryption pdf-parser reverse-engineering

Last synced: 02 Aug 2025

https://github.com/vishwagauravin/pdf-parser-client-side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

client-side pdf pdf-parser pdf-reader pdfjs

Last synced: 08 Apr 2025

https://github.com/easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

azure-openai chroma chromadb embedding-models embedding-vectors embeddings langchain langchain-python pdf pdf-document-processor pdf-parser pdf-parsing python word-embeddings

Last synced: 25 Jun 2025

https://github.com/ashutoshvarma/libxpdf

Static library built from source of www.xpdfreader.com with most of dependencies built within

cplusplus cpp-library pdf pdf-parser pdf-viewer-component xpdf xpdf-reader

Last synced: 12 Apr 2025

https://github.com/sidmishraw/cs-267-project

PDF-Parser and Apriori and Simplical Complex algorithm implementations

apriori-algorithm data-mining-algorithms pdf pdf-json pdf-parser text-mining

Last synced: 12 Apr 2025

https://github.com/j-sephb-lt-n/pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

bank banking document-parsing financial-analysis first-national-bank fnb pdf-parser pdf-parsing python

Last synced: 31 Aug 2025

https://github.com/cschen1205/spring-pdf-search-engine

PDF Search Engine implemented in Java and Spring Boot

elastic-search pdf-parser pdf-upload search-engine spring-boot

Last synced: 12 Oct 2025

https://github.com/sylphlab/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

ai-agent llm-tool mcp model-content-protocol nodejs pdf pdf-parse pdf-parser pdf-reader stdio typescript

Last synced: 11 Apr 2025

https://github.com/pspdfkit/nutrient-pdf-mcp-server

A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration

ai-integration llm-tools mcp pdf pdf-parser pdf-tools python

Last synced: 05 Sep 2025

https://github.com/flazefy/gudangku-laravel

GudangKu helps you manage your belongings, from home supplies and food stock to furniture. Set reminders to remind you to cleaning or maybe time to restocking some of your home supplies. In this apps also have generate reports to create shopping or maintenance list. Start organizing your inventory with GudangKu’s features. Created using Laravel

api-testing cronjob csv-export firebase firebase-storage integration-testing laravel mailer migrations mysql pdf pdf-parser php rest-api seeding statistics swagger task-scheduler telegram-bot unit-testing

Last synced: 01 Jul 2025

https://github.com/hfrewreeft/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

ai-agent llm-tool mcp model-content-protocol nodejs pdf pdf-parse pdf-parser pdf-reader stdio typescript

Last synced: 29 Jun 2025

https://github.com/shtse8/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

ai-agent llm-tool mcp model-content-protocol nodejs pdf pdf-parse pdf-parser pdf-reader stdio typescript

Last synced: 05 Apr 2025

https://github.com/bratergit/hacktoberfest2020

Hacktoberfest 2020 - Faça um programa desktop que rode no terminal que dado um pdf da toro investimentos com as corretagens do dia. Mostre o Cálculo do Imposto de Renda para day trade do mini dolar e mini índice da bovespa.

bovespa hacktoberfest hacktoberfest2020 javascript nodejs pdf-parser toroinvestimentos

Last synced: 05 Apr 2025

https://github.com/siddhantsingh1230/snapcv

A Simple NLP Web App to create summaries of your CVs

nlp node pdf-parser react summarizer

Last synced: 12 Oct 2025

https://github.com/saviobatista/vitae

AI-powered résumé transformer: match your CV to any job and export in LaTeX PDF.

ai-resume career-tools document-processing job-applications latex openai oss pdf-parser resume-builder tailored-resume typescript vercel

Last synced: 07 Sep 2025

https://github.com/aqiftekhar/openaichatbot

This is a healthcare Chatbot implemented using Open AI that also recieve PDF Documents and Images and prescribe based on summary

nextjs openai openai-api pdf-parser react tesseract vision

Last synced: 31 Dec 2025

https://github.com/luccahirae/invoice-extract-server

API para extração de dados de faturas

express jest multer nodejs pdf-parser prisma

Last synced: 09 Apr 2025

https://github.com/petermosmans/apdfhelper

Fix links in PDF files, rewrite links, extract text annotations, remove pages

annotations calendar pdf pdf-converter pdf-extractor pdf-parser planner

Last synced: 16 Mar 2025

https://github.com/jasoncobra3/floorplan-dimractor

A sophisticated Python pipeline for automatically extracting dimensions and cabinet codes from architectural floorplan PDFs. This tool converts various dimension formats into standardized measurements and provides structured output with visualization capabilities.

architecture-tools automation-tools blueprint-analysis cad-automation computer-vision dimension-extraction document-processing document-processing-pipeline floorplan-analysis image-processing measurement-tools opencv pdf-parser pdf-processing pdfplumber pymupdf streamlit text-detection

Last synced: 08 Oct 2025

https://github.com/siddhantsingh1230/snapcv_backend

A Node Backend Server for SnapCV

expressjs node-nlp nodejs pdf-parser react

Last synced: 12 Oct 2025

https://github.com/jogemu/pdf2tree

Parse PDF and group elements based on enclosing lines. A node.js module that promisifies the pdf2json parser and structures the data in a way that is suitable for tables with merged cells.

data-table hierarchical-data merged-table-cells pdf-parser tree-structure

Last synced: 13 Oct 2025

https://github.com/byerlikaya/smartrag

SmartRAG is a production-ready .NET 9.0 library that provides a complete Retrieval-Augmented Generation (RAG) solution. Features include multi-provider AI support (OpenAI, Anthropic, Gemini), enterprise vector storage (Qdrant, Redis, SQLite), and intelligent document processing (PDF, Word, Text).

ai anthropic csharp document-processing document-qa dotnet enterprise-ai gemini llm machine-learning natural-language-processing openai pdf-parser qdrant rag redis retrieval-augmented-generation vector-database word-parser

Last synced: 27 Dec 2025

https://github.com/vinayaksandilya/notebook-front-end

Turn any PDF into a structured online course with modules, summaries, and key takeaways — powered by Node.js, MySQL, and AI models like GPT-4 & Claude.

ai claude course-generator education-tech fullstack gpt openai pdf-parser

Last synced: 08 Jul 2025

https://github.com/patrixshah/resumescreening

Resume Screening: An AI Driven User Profile Screening Tool

chatgpt3 express jest mammoth multer nodejs openai pdf-parser typescript

Last synced: 30 Dec 2025

https://github.com/dills122/cardboard-crack

Web app for parsing/viewing Soccer Card Checklists

angular pdf-parser primeng soccer sports-cards

Last synced: 23 Mar 2025

https://github.com/devleejb/pdf-parser

PDF to JSON in my computer!

pdf pdf-parser

Last synced: 14 May 2025

https://github.com/souravupadhyay7/morvs_chat_bot

🤖 MORVS AI - An intelligent chat interface powered by Groq's LLaMA 3 model with PDF processing capabilities. Built with Next.js, React, TypeScript, and modern UI components.

ai-assistant ai-chatbot chat-interface conversational-ai cyberpunk-ui framer-motion groq nextjs pdf-parser pdf-processing real-time-chat shadcn-ui tailwindcss typescript

Last synced: 18 Aug 2025

https://github.com/sankeer28/pdf-searcher

Live website to Parses multiple PDFs using PDF.js

pdf-parser pdf-search

Last synced: 27 Dec 2025

https://github.com/saniyaacharya04/resume-scanner-using-nlp

A live resume scanning and ranking tool built with Python, Streamlit, and NLP. Upload resumes, match them to job descriptions, and generate analytics dashboards and PDF reports.

dashboard job-matching nlp pdf-parser resume-scanner scikit-learn spacy streamlit transformers

Last synced: 31 Oct 2025

https://github.com/chinmaymisra/personal-finance-tracker

Upload Axis Bank statements as PDFs, automatically parse transactions, and view them cleanly in a modern UI. Handles invalid files and non-supported banks gracefully. Built using React (Vite) and FastAPI.

axis-bank bank-statement fastapi financial-application fullstack pdf-parser python react typescript vite

Last synced: 30 Dec 2025

https://github.com/sourik-10/prismai

QuickAI is a full-stack AI web application built with a modular client–server architecture. The project is primarily developed in JavaScript, with the frontend and backend kept in separate folders for better structure and scalability. It leverages modern web technologies and integrates AI-powered features to deliver intelligent interactions.

axios clerk clipdrop-api cloudinary cors dotenv expre gemini-api multer neon nodejs pdf-parser react react-router-dom tailwindcss toaster

Last synced: 30 Dec 2025

https://github.com/fayazk/document-metadata-extractor

A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.

content-indexing data-extraction document-management document-processing docx-parser excel-export gemini-ai-project generative-ai json-output metadata-extraction nlp pdf-parser python-automation text-analysis

Last synced: 01 Apr 2025