An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pdf-document-processor

A curated list of projects in awesome lists tagged with pdf-document-processor .

https://github.com/wmjordan/pdfpatcher

PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等

pdf pdf-converter pdf-document-processor pdf-generation

Last synced: 12 May 2025

https://github.com/wmjordan/PDFPatcher

PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等

pdf pdf-converter pdf-document-processor pdf-generation

Last synced: 24 Mar 2025

https://github.com/qpdf/qpdf

qpdf: A content-preserving PDF document transformer

pdf pdf-document-processor

Last synced: 11 May 2025

https://github.com/pdf2htmlEX/pdf2htmlEX

Convert PDF to HTML without losing text or format.

html pdf pdf-document-processor pdf-viewer

Last synced: 24 Mar 2025

https://github.com/abarker/pdfCropMargins

pdfCropMargins -- a program to crop the margins of PDF files

crop cropper pdf pdf-converter pdf-document-processor python

Last synced: 26 Mar 2025

https://github.com/sailist/chatgpt-enhancement-extension

An all-in-one plugin to improve your ChatGPT experience!

chatgpt chatgpt-chrome-extension markdown pdf-document-processor

Last synced: 07 Apr 2025

https://github.com/hellerbarde/stapler

A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk

pdf pdf-converter pdf-document-processor python

Last synced: 06 Apr 2025

https://github.com/michaelrsweet/pdfio

PDFio is a simple C library for reading and writing PDF files.

c pdf pdf-document pdf-document-api pdf-document-processor pdf-generation

Last synced: 16 May 2025

https://github.com/svenssonaxel/pdf-sign

A tool to sign PDF files. With Linux support.

pdf pdf-document-processor pdf-sign pdf-signer pdf-signing

Last synced: 06 Apr 2025

https://github.com/gurpreetkaurjethra/multi-pdfs_chatapp_ai-agent

Meet MultiPDF 📚 Chat AI App! 🚀 Chat seamlessly with Multiple PDFs using Langchain, Google Gemini Pro & FAISS Vector DB with Seamless Streamlit Deployment. Get instant, accurate responses from Awesome Google Gemini OpenSource language Model. 📚💬 Transform your PDF experience now! 🔥✨

chat-application chatbot-application chatgpt gemini gemini-api gemini-pro generative-ai google instructor-embeddings langchain langchain-python large-language-models llm open-source openai pdf-document-processor python-3 streamlit-application

Last synced: 13 Jul 2025

https://github.com/GURPREETKAURJETHRA/Multi-PDFs_ChatApp_AI-Agent

Meet MultiPDF 📚 Chat AI App! 🚀 Chat seamlessly with Multiple PDFs using Langchain, Google Gemini Pro & FAISS Vector DB with Seamless Streamlit Deployment. Get instant, accurate responses from Awesome Google Gemini OpenSource language Model. 📚💬 Transform your PDF experience now! 🔥✨

chat-application chatbot-application chatgpt gemini gemini-api gemini-pro generative-ai google instructor-embeddings langchain langchain-python large-language-models llm open-source openai pdf-document-processor python-3 streamlit-application

Last synced: 17 Apr 2025

https://github.com/naivehobo/pdfviewer

PDFViewer is a GUI tool, written using python3 and tkinter, which lets you view PDF documents.

pdf pdf-document pdf-document-processor pdf-files pdf-viewer tkinter tkinter-graphic-interface tkinter-gui tkinter-library tkinter-python

Last synced: 30 Apr 2025

https://github.com/lovasoa/pagelabels-py

Python library to manipulate PDF page labels

labels page pdf pdf-document-processor

Last synced: 12 May 2025

https://github.com/onedoclabs/onedoc

The first developer-oriented document platform. Generate, host and track PDFs with a single API, beautifully.

api document document-generator html nodejs pdf pdf-document-processor pdf-generation pdf-library pdf-reader pdf-reports pdf-viewer react react-print-pdf sdk ycombinator

Last synced: 06 Apr 2025

https://github.com/OnedocLabs/onedoc

The first developer-oriented document platform. Generate, host and track PDFs with a single API, beautifully.

api document document-generator html nodejs pdf pdf-document-processor pdf-generation pdf-library pdf-reader pdf-reports pdf-viewer react react-print-pdf sdk ycombinator

Last synced: 02 Aug 2025

https://github.com/pspdfkit/nutrient-dws-client-python

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

ocr-python pdf-converter pdf-document-processor pdf-generation pdf-processing python

Last synced: 05 Sep 2025

https://github.com/siddhantsadangi/pdf-workdesk

A Streamlit-powered application that provides a user-friendly interface for editing PDF documents.

pdf pdf-document pdf-document-processor pdf-files pdf-viewer pdfkit python streamlit webapp

Last synced: 16 Mar 2025

https://github.com/pankajr141/pdf2jpg

Utility to convert PDF into JPG files

pdf-converter pdf-document-processor

Last synced: 05 Mar 2025

https://github.com/hoehermann/pypdf_strreplace

Search and replace text in PDF files with PyPDF.

pdf pdf-document-processor pypdf

Last synced: 30 Apr 2025

https://github.com/taseikyo/backup-utils

:sparkles: A batch of useful code/scripts: run commands automatically, finish repetitive stupid operations, perform format conversions, etc.

backup-utils backups bash bilibili pdf-document-processor python3 scripts-collection srt-subtitles zhihu

Last synced: 05 Oct 2025

https://github.com/bobld/pdfpigmlnetblockclassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

classifier csharp document-layout document-layout-analysis layout-analysis lightgbm machine-learning ml-net pdf pdf-document pdf-document-processor pdfpig publaynet

Last synced: 14 Apr 2025

https://github.com/ptyadana/python-projects-dojo

Collections of python projects including machine learning projects, image and pdf processing, password checkers, sending emails, sms, web scraping,flask web app,selenium automation testing,etc

csv-files db-api email fileio flask http http-requests image-processing json jsonpickle notebook-jupyter pdf-document-processor pickle python python-dojo requests selenium selenium-webdriver twilio-sms webscrpaing

Last synced: 14 Jun 2025

https://github.com/utkarsh212/react-pdftotext

Light-weight memory-safe client library for extracting plain text from pdf files.

npm package pdf-document-processor pdf-to-text pdfjs react typescript

Last synced: 05 Jul 2025

https://github.com/ksharindam/pdfcook

Prepress preparing tool and PDF editor

pdf pdf-document-processor pdf-editor prepress

Last synced: 04 Apr 2025

https://github.com/josee9988/compress-pdfs

A python CLI script to 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀 📦 all the 𝗣𝗗𝗙 files 𝗿𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲𝗹𝘆 in a directory using the iLovePDF technology 🥰

compress compress-files compress-pdf compressed compression compressor compressors pdf pdf-compression pdf-converter pdf-document-processor pdf-files python python-3 python-compressing python-pdf python3 python3-script python3-scripts size-reduction

Last synced: 30 Oct 2025

https://github.com/vivekweb2013/pdf-utils

An android app to perform different operations on pdf files

android android-app app free no-ads pdf pdf-document-processor pdf-utilities

Last synced: 25 Oct 2025

https://github.com/easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

azure-openai chroma chromadb embedding-models embedding-vectors embeddings langchain langchain-python pdf pdf-document-processor pdf-parser pdf-parsing python word-embeddings

Last synced: 25 Jun 2025

https://github.com/prithivsakthiur/rag-pdf-chatbot

(PDF) Information and Inference, Retrieval-Augmented Generation [ RAG ]

llm packages-manager pdf-document-processor pdf-viewer streamlit

Last synced: 06 May 2025

https://github.com/ammirsm/automatic-pancake

Active learning agent-based-simulation for systematic reviews and other types of technology assisted review (TAR) which will include PDF documents and other meta-datas in itself and it's based on both fulltext-screening decisions and title-screening decisions.

active-learning agent-based-simulation machine-learning pdf-document-processor python scikit-learn systematic-review systematic-reviews technology-assisted-review

Last synced: 12 Apr 2025

https://github.com/moindalvs/resume_screening_and_parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data

Last synced: 23 Apr 2025

https://github.com/orchetect/pdfgadget

Batch PDF operations for Swift

pdf pdf-document-processor pdf-files pdf-merger swift

Last synced: 23 Apr 2025

https://github.com/vrajvyas11/pdf-manipulator

A comprehensive PDF tool that allows you to effortlessly edit, merge, split, compress, and convert PDFs. It supports adding pages, extracting images, and viewing PDFs directly within the app. With a user-friendly drag-and-drop interface, it’s fully responsive across all devices, streamlining document management for everyone.

add-page image-to-pdf pdf pdf-compressor pdf-document-processor pdf-download pdf-editor pdf-image-extractor pdf-manipulation pdf-merger pdf-viewer split-pdf

Last synced: 26 Oct 2025

https://github.com/xiejiss/pdf-tools

Useful PDF tools to work with PDF translation platforms.

pdf pdf-document-processor

Last synced: 05 Sep 2025

https://github.com/mavaddat/jpdfbookmarks

Create and edit bookmarks on existing PDF files.

pdf pdf-document-processor

Last synced: 28 Oct 2025

https://github.com/robinmillford/cortex-ai-multi-model-insights-hub

Cortex AI: Multi-Model Insights Hub is an advanced platform that leverages cutting-edge AI to empower your research, analysis, and data exploration. By integrating multiple Large Language Models (LLMs) with a sophisticated Retrieve-and-Generate (RAG) system

article-extractor chatbot data-analysis data-visualization deepseek-chat deepseek-r1 llama3 llm pdf-document-processor rag streamlit-webapp summarizer vector-database

Last synced: 28 Oct 2025

https://github.com/marcbuch/tr-pdf-parser

Parses invoice PDF files from the german brokerage Trade Republic

automation pdf-document-processor python python3 trade-republic traderepublic-statements

Last synced: 04 Sep 2025

https://github.com/subhangisati/langchat-explorer

"LangChat Explorer: Your intuitive document companion. Effortlessly explore vast information with natural language conversations. Simplify queries, gain insights, and embark on a seamless journey of knowledge discovery. Unleash the power of language with LangChat Explorer."

api deep-learning document-retrieval generative-ai llms machine-learning pdf-document-processor python3 q-and-a-bot

Last synced: 21 Feb 2025

https://github.com/thomasvanholder/browserless

A Ruby wrapper for the Browserless PDF API with support for modern CSS such as TailwindCSS

pdf pdf-converter pdf-document pdf-document-processor pdf-files pdf-generation

Last synced: 25 Apr 2025

https://github.com/rvbcldud/focus-study

A collection of FOCUS Bible studies in booklet format.

bible-study catholic pdf-document-processor

Last synced: 06 May 2025

https://github.com/juniortorresmtj/galeroouleai

O GaleroouleAi é uma aplicação inovadora que combina tecnologias de ponta em inteligência artificial para oferecer uma solução robusta e interativa em Análise de Documentos. Utilizando a LLM Gemini AI da Google, integrada com o modelo de Embedding, nossa aplicação é capaz de entender e processar PDFs

alura gemini-api gemini-pro google llm pdf-document-processor python react tailwind-css

Last synced: 25 Mar 2025

https://github.com/jmfeck/python-pdf-tools

Python PDF Tools is a Python-based collection of ready-to-use applications designed for various PDF manipulations. Each tool is set up as an independent app that can be triggered by running a batch file located in the root of its folder. This project is under active development.

pdf pdf-converter pdf-document-processor pdf-generation pdfkit python

Last synced: 08 Apr 2025

https://github.com/vortexv7/papergpt

PaperGPT is a web application powered by the GPT-3 model and HuggingFaceHub AI Model that allows users to upload PDF documents and ask questions related to the content of those documents. It leverages advanced natural language processing techniques to provide accurate and contextually relevant answers to user queries.

ai bot gpt-3 huggingface pdf-document-processor render streamlit streamlit-webapp

Last synced: 06 Apr 2025

https://github.com/leomsgit/extrator-de-parametros-analise-hemograma-e-bioquimico

Software em Python para varrer arquivos PDF e extrair parâmetros diretamente para arquivo Excel

analysis data excel excel-export google-colab hemogram jupyter-notebook pdf pdf-document-processor pdf-viewer python python3

Last synced: 06 Apr 2025

https://github.com/rayyan9477/pdf-chatbot

This is a Streamlit-based web application that allows users to upload PDF files and ask questions about their content. The application uses a combination of natural language processing techniques and vector-based text retrieval to provide answers to the user's questions.

chatbot machine-learning machine-learning-algorithms natural-language-processing pdf pdf-document-processor python

Last synced: 08 Jul 2025

https://github.com/mechadragonx/l

2020 Computer Science HL IA・An application that parses multiple different types of resumes and puts the data, in a sorted fashion, on a database. Name based on the "Death Note" character. (http://bit.ly/l-dn)

aws-lambda csharp database dotnetcore3-1 dynamodb nodejs pdf pdf-document-processor resume-parser s3-bucket word-documents

Last synced: 11 Mar 2025

https://github.com/trogon/otus-pdf

Object oriented PDF document generation library (for PHP).

composer-package library pdf-document pdf-document-processor pdf-generation php php54 php55 php56 php7 php71 php72 php73

Last synced: 03 Oct 2025

https://github.com/nicky-nn/pdf_combiner

Una herramienta en Python con interfaz gráfica para combinar múltiples PDFs en uno solo.

pdf pdf-conversion pdf-converter pdf-document-processor pdfs-python-pdf-tools-gui-application

Last synced: 21 Jul 2025

https://github.com/jeff-tian/doc-rotary-fc

Ali Functional Compute version of [doc-rotary](https://github.com/Jeff-Tian/doc-rotary)

ali aliyun document-pdf fc functions-as-a-service libreoffice nodejs officedocs pdf pdf-document-processor pdf-generation typescript

Last synced: 05 Mar 2025

https://github.com/abdolalixx/pdf

Generate PDFs effortlessly with Cloudflare's Browser Rendering API. Clone the repository and deploy your custom solution today! 🚀📄

chatpdf chatwithpdf chinese document image-processing java korean latex modify ocr pdf-converter pdf-document-processor pdf-editor pdf-merger pdf-tools pdfgpt translate zotero

Last synced: 23 Jun 2025

https://github.com/akari2600/pdf_analyzer_poc

PDF layout analysis with OpenCV

kivy opencv pdf-document-processor

Last synced: 05 Dec 2025

https://github.com/johngodoi/brokersnoteloader

This application aims to convert some broker's note into a formatted text that can be easily imported to a spreadsheet.

brokerage parser pdf-document-processor scala

Last synced: 17 Jun 2025

https://github.com/bigdaddykir0/ai-chat-app

A full-stack chat application with a React frontend and Python FastAPI backend, featuring real-time messaging and AI-powered responses.

ai-chat chatbot chatbot-application chatgpt google javascript kotlin langchain llm nlp openai pdf-document-processor react typescript

Last synced: 10 Apr 2025

https://github.com/maqeel019/ats

A powerful Python-based ATS that parses and ranks PDF resumes on recruiter-defined filters like skills, education, and experience. Handles scanned and complex resumes with detailed scoring and Excel output.

data-science excel model pandas pdf-document-processor pyhton text-classification text-processing

Last synced: 05 Oct 2025

https://github.com/m32/pdfium

cffi bindings to PDFium library

pdf-document-processor pdfium

Last synced: 01 Apr 2025

https://github.com/18311-claire-xinjin/pdf-chat-app

📄 Upload PDFs and chat with AI for instant insights, all through a sleek, user-friendly interface that works on any device.

chatgpt embeddings gemini gemini-pro generative-ai instructor-embeddings langchain next-auth pdf-document-processor pinecone prisma python react-query shadcn-ui streamlit streamlit-application tailwind vector-database

Last synced: 10 Oct 2025

https://github.com/basdjay/docgen-ai-

📄 Generate professional documentation for your GitHub repositories using AI. Streamline your workflow with auto-generated README files, API docs, and more.

ai api-documentation chat-with-pdf chrome-extension code-analysis developer-tools docu gemini-ai github javascript langchain openai pdf pdf-document-processor pdf-generation pptx productivity readme-generator

Last synced: 10 Oct 2025

https://github.com/RobinMillford/Cortex-AI-Multi-Model-Insights-Hub

This project creates a Retrieve-and-Generate (RAG) powered chatbot for summarizing and interacting with articles. The system processes articles provided as PDFs or URLs, extracts text, splits the content into chunks, generates embeddings, and stores them in a vector database

article-extractor chatbot llama3 llm pdf-document-processor rag streamlit summarizer vector-database

Last synced: 11 Oct 2025

https://github.com/vin0x/pdf-to-vehicle-data-etl

This project extract data from a website (.pdf file) containing car data, manipulate data, store in a AWS RDS, create pipeline with Apache Airflow to automatically refresh and create a Power BI Dashboard.

database-schema etl jupyter manipulate-data pdf-document-processor

Last synced: 15 Oct 2025

https://github.com/ctadeodev/spark-word-counter

A Dockerized PySpark application for counting word frequencies in an input PDF document

docker pdf-document-processor pyspark python spark

Last synced: 07 Sep 2025

https://github.com/shihjen/pdf_merger

A lightweight PDF merging application built with Python using PyPDF and Streamlit.

pdf-document-processor streamlit

Last synced: 30 Aug 2025

https://github.com/mostasirmahim/mahimai

This is a Gemini-powered PDF Analyzer web application that lets users upload PDF files and interact with them.

gemini-api mahimai pdf-document-processor

Last synced: 04 Oct 2025

https://github.com/ronierisonmaciel/askrag

Este projeto permite realizar perguntas em linguagem natural sobre o conteúdo de arquivos PDF. Utiliza a abordagem RAG (Retrieval-Augmented Generation)

aiagent faiss-cpu pdf-document-processor python rag-chatbot

Last synced: 11 Aug 2025

https://github.com/a6b8/documents-with-footer-to-pdf-for-ruby

Add a footer to each document and create a single .pdf file all in one command.

footer pdf-converter pdf-document pdf-document-processor pdf-generation prawn-pdf

Last synced: 13 Mar 2025

https://github.com/andruhovski/aspose-pdf-js

Aspose.PDF for JavaScript via C++

pdf pdf-converter pdf-document pdf-document-processor

Last synced: 15 Jul 2025

https://github.com/panuozzo77/dumbassacronymgenerator

It search for words inside the PDFs you load into in order to generate acronyms. Started working to grammar consistency generation (for added fun)

acronyms customizable generator pdf-document-processor

Last synced: 31 Mar 2025

https://github.com/mohith202/ema-chatbot

A public chatbot can use PDF from user or uses preloaded dataset and answer Query from user while displaying source.

embeddings-word2vec lama llm pdf-document-processor python3 voice-to-text

Last synced: 31 Mar 2025

https://github.com/poacosta/pdf-2-chroma

A production-ready Python script that transforms PDF document collections into locally-stored, semantically searchable vector databases using ChromaDB's persistent storage.

chromadb embeddings openai pdf-document-processor python rag sentence-transformers vector-database

Last synced: 26 Jun 2025

https://github.com/anonfaded/pdf-merger

This tool allows you to merge PDF files through a graphical user interface (GUI) or a command-line interface (CLI) on Windows, Linux, and Mac.

pdf-document-processor pypdf2 python

Last synced: 30 Mar 2025

https://github.com/florian-berger/pdfcombiner

Small tool for Microsoft Windows that makes it easy to combine multiple PDF documents into one

csharp netframework pdf pdf-converter pdf-document pdf-document-processor

Last synced: 06 Apr 2025

https://github.com/azuregray/anywhereprintmachine

A platform for a futuristic Anywhere Print Machine idea. Software Idea for print machines placed locally just like ATMs where users can access and get prints at any time with extended functionalities.

anytime anywhere atm idea operating-system pdf pdf-document pdf-document-processor pdf-viewer print real-time real-time-data software software-architecture

Last synced: 01 Apr 2025

https://github.com/uni-creator/rag-multifile-qa

A RAG (Retrieval-Augmented Generation) AI chatbot that allows users to upload multiple document types (PDF, DOCX, TXT, CSV) and ask questions about the content. Built using LangChain, Hugging Face embeddings, and Streamlit, it enables efficient document search and question answering using vector-based retrieval. 🚀

ai chatbot embeddings huggingface langchain nlp pdf-document-processor rag search streamlit vector

Last synced: 12 Sep 2025

https://github.com/mohiteamit/ai-pdf-summarizer

AI-Powered PDF Summarizer: Upload PDFs, extract insights, generate customizable summaries with GPT-4. Web app & API integration.

openai openai-api pdf-document-processor pydantic python streamlit

Last synced: 03 Mar 2025

https://github.com/imrandil/pdf_merger_python

pdf merger project python

pdf-document-processor project python3

Last synced: 13 Jun 2025

https://github.com/vordimous/vue-pdf-splitter

Building a PDF page splitter using vue-pdf and vueUse.

pdf-document-processor vuejs

Last synced: 05 Apr 2025

https://github.com/sachnaror/pdf-question-answering-web-app-using-ai-

PDF Question Answering Web App (Using AI)

pdf-document-processor

Last synced: 20 Feb 2025