An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pdf-processing

A curated list of projects in awesome lists tagged with pdf-processing .

https://github.com/dissorial/doc-chatbot

Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.

chat chatbot document-embedding gpt-3 gpt-4 langchain mongoose nextjs openai openai-api pdf-processing pinecone reactjs tailwindcss typescript vectorization

Last synced: 04 Apr 2025

https://github.com/pspdfkit/nutrient-dws-client-python

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

ocr-python pdf-converter pdf-document-processor pdf-generation pdf-processing python

Last synced: 05 Sep 2025

https://github.com/inc44/matools

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

application audio-processing code-formatting file-management gui image-processing ocr pdf-processing productivity python qt rust speech-recognition video-processing youtube-downloader

Last synced: 14 Oct 2025

https://github.com/pdftl/pdftl

pdftl: A modern Python-based successor to pdftk. A CLI tool for PDF manipulation, merging, and geometry (crop/chop) powered by pikepdf and qpdf.

cli-app pdf pdf-manipulation pdf-processing pdf-python pdftk-alternative pdftl pikepdf python-pdf qpdf

Last synced: 08 Feb 2026

https://github.com/arsath-eng/rag1-nvidia-genai

A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.

document-analysis embeddings faiss langchain llama-models llm nvidia-ai-faundry pdf-processing question-answering rag streamlit vector-store

Last synced: 27 Oct 2025

https://github.com/wesellis/tech-adobe-enterprise-automation-powershell-python-portfolio

[40% Complete] Architecture template for Adobe CC enterprise automation. Well-structured skeleton with directories, configs, and infrastructure. README describes features not yet implemented. Only ~2 PS scripts exist. Reference architecture, not production suite.

adobe-creative-cloud azure docker enterprise-automation kubernetes license-management machine-learning nodejs pdf-processing powershell python rest-api servicenow-integration terraform user-provisioning

Last synced: 06 Oct 2025

https://github.com/al-shwaib/book-preparation-for-printing

A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.

a3-printing arabic-books book-preparation commercial-printing flask-application offset-printing order-to-print pdf-processing pymupdf rtl-support

Last synced: 03 Feb 2026

https://github.com/mohamedelareeg/imageautomaticcroppingwatcher

Image Automatic Cropping Watcher: A tool that automatically detects PDF files, converts them to images, corrects perspective distortion, and compiles them back into PDFs.

ai autoskew itextsharp json opencv pdf pdf-generation pdf-processing

Last synced: 02 Mar 2025

https://github.com/diocrafts/ai-book-summarizer

📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

ai ai-powered-tools automation book-summary document-analysis educational-tools knowledge-extraction machine-learning markdown natural-language-processing openai pdf pdf-processing pdf-summarization pymupdf python study-materials text-analysis text-summarization

Last synced: 14 Jul 2025

https://github.com/hemaldholakiya12/pdfchat

A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .

ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store

Last synced: 30 Oct 2025

https://github.com/noorjotk/local-rag-engine

Local RAG app with zero-config Docker setup. FastAPI + Streamlit + Qdrant + Ollama. Just run `docker-compose up --build`! 🚀

docker fastapi llm local-ai local-llm local-ollama ollama pdf-processing privacy-focused python qdrant qdrant-vector-database rag semantic-search streamlit vector-database

Last synced: 09 Sep 2025

https://github.com/furqanhun/textnomnom-py

Extract text from PDFs, PPTs, & URLs (with OCR support). Converts PPT to PDF & handles files or folders. 🦍

automated-conversion automation cross-platform document-conversion image-text-extraction linux pdf-processing pdf-to-text ppt ppt-to-text pptx pptx-to-text text-extraction windows

Last synced: 23 Mar 2025

https://github.com/farhaj499/rag_with_weaviate_db

This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.

agentic-ai embeddings huggingface-transformers langchain pdf-processing python rag retrieval-augmented-generation semantic-search vector-database weaviate

Last synced: 23 Mar 2025

https://github.com/king04aman/pdf-extractor-api

PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.

api-security docker-compose doker fastapi invoice-management invoice-pdf jwt-auth jwt-authentication jwt-token pdf-processing pdf-processor python python3 rate-limiting sap

Last synced: 18 Jan 2026

https://github.com/omritriki/biu-points-calculator

A web application for calculating credit points and GPA from PDF transcripts. Built with FastAPI and pdfplumber, this tool simplifies the process for BIU engineering students.

api biu credit-points css education fastapi gpa-calculation-tool html pdf-processing python render university-tools web-application

Last synced: 09 Oct 2025

https://github.com/khushirajurkar/medical-document-summarizer

An interactive pipeline that extracts and summarizes clinical trial PDFs using advanced NLP models

clinical-trials document-summarization medical-nlp pdf-processing transformers

Last synced: 19 Jan 2026

https://github.com/uzairsayyed-005/docuchat-ai

DocuChat-AI is an AI-powered document interaction assistant that transforms static PDFs into conversational partners. It leverages Retrieval-Augmented Generation (RAG), history-aware memory, and advanced NLP to enable natural language Q&A, contextual dialogue, and secure local document processing.

ai conversational-ai document-analysis fiass groq history-aware huggingface langchain local-processing pdf-processing rag streamlit

Last synced: 28 Mar 2025

https://github.com/souravupadhyay7/morvs_chat_bot

🤖 MORVS AI - An intelligent chat interface powered by Groq's LLaMA 3 model with PDF processing capabilities. Built with Next.js, React, TypeScript, and modern UI components.

ai-assistant ai-chatbot chat-interface conversational-ai cyberpunk-ui framer-motion groq nextjs pdf-parser pdf-processing real-time-chat shadcn-ui tailwindcss typescript

Last synced: 18 Aug 2025

https://github.com/khushikumarigupta14/pdf-mcq-extractor

PDF MCQ Extractor – Quickly extract multiple-choice questions from PDFs and export them as structured JSON. Perfect for educators, students, and study apps.

educational-tools machine-readable-mcqs mcq-extraction pdf-processing pdf-to-json study-materials

Last synced: 16 Sep 2025

https://github.com/faerque/pdf_scraper

PDF Scraper with Automation - A CLI tool for extracting text from PDFs and storing it in an SQLite database for structured querying. Supports digitally generated PDFs and enables efficient document processing.

automation cli-tool document-management document-management-system natural-language-processing pdf-processing sqlite text-extraction

Last synced: 12 Apr 2025

https://github.com/nathania-rachael/chat-with-multiple-pdfs

An AI-powered chatbot that lets users upload multiple PDFs and ask questions based on their content. It extracts text, processes it with FAISS, and retrieves answers using Google Generative AI (Gemini Pro) through a simple Streamlit interface.

faiss gemini-pro google-generative-ai langchain pdf-chat-bot pdf-processing python streamlit

Last synced: 03 Mar 2025

https://github.com/jcaperella29/ai_llm_set_up

AI-powered research paper summarization using local LLMs (Ollama). Extracts, processes, and summarizes PDFs with structured insights. Ideal for scientific papers & bioinformatics

ai llm machine-learning nlp ollama pdf-processing python research

Last synced: 22 Feb 2025

https://github.com/fujiba/pdf-chunker

LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and downsample images for RAG/Bedrock.

aws-bedrock chunking claude cli image-optimization llm pdf pdf-chunker pdf-processing pdf-splitting python rag token-optimization

Last synced: 13 Jan 2026

https://github.com/adudhe01/pdfreaderchatbot

A chatbot app using Streamlit, LangChain, and OpenAI to interact with uploaded PDFs, extract text, and answer questions based on the document content.

chatbot conversational-ai faiss langchain openai pdf-processing streamlit text-extraction vector-store

Last synced: 30 Dec 2025

https://github.com/omkarspace/synthesis

AI-powered research paper generation and analysis platform. Automatically generates comprehensive research papers from uploaded documents using multi-agent orchestration with Gemini AI, featuring PDF/DOCX export, concept visualization, and intelligent Q&A.

academic-writing agent-orchestration ai gemini-api generation machine-learning next-js nlp pdf-processing research-paper research-tool typescript

Last synced: 27 Dec 2025

https://github.com/ungerik/pdfzig

A fast, cross-platform PDF utility tool written in Zig, powered by PDFium

pdf pdf-processing pdf-rendering pdfium zig

Last synced: 22 Jan 2026

https://github.com/abishek7952/resume-classifier

An end-to-end machine learning web app that classifies PDF resumes into job-fit categories. Built with FastAPI, Streamlit & Docker. Deployed on Render.

docker end-to-end-ml-workflows fastapi machine-learning nlp pdf-processing portfolio-project render resume-classfier streamlit

Last synced: 18 Apr 2025

https://github.com/ojas1584/contract_analysis

Automated legal contract analysis using a local LLM and a 4-call, high-accuracy RAG pipeline with semantic validation.

clause-extraction contract-analysis embeddings faiss langchain legal-ai legal-tech llm ollama pdf-processing pdf-processing-nlp-ai-python-pdf-search-automation rag semantic-search

Last synced: 22 Nov 2025

https://github.com/terry-li-hm/prometheus

PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude

ai-tools claude-code document-processing fastmcp mcp-server pdf-processing pdf-splitter prometheus pymupdf python text-extraction

Last synced: 03 Sep 2025

https://github.com/anshi312/financial-analyst-rag

Financial QA assistant using RAG, FAISS, LangChain & HuggingFace to query 10-Ks & reports. PDF search, NLP, and Streamlit UI for insights.

document-ai faiss financial-analysis huggingface langchain nlp pdf-processing rag semantic-search sentence-transformers streamlit

Last synced: 03 Jul 2025

https://github.com/ujjwalsaini07/ollamamulti-rag

OllamaMulti-RAG 🚀 is a multimodal AI chat app combining Whisper AI for audio, LLaVA for images, and Chroma DB for PDFs, enhanced with Ollama and OpenAI API. 📄 Built for AI enthusiasts, it welcomes contributions—features, bug fixes, or optimizations—to advance practical multimodal AI research and development collaboratively.

ai-chatbot audio-transcription chat-application image-understanding knowledge-retrieval langchain llava-llama3 llm multimodal-ai ollama openai pdf-processing rag trend trending-topics vector-database whisper

Last synced: 30 Dec 2025

https://github.com/sigmakib2/express-pdf-watermark-api

A simple Express.js API for embedding watermarks on PDF files using pdf-lib and multer. This project demonstrates how to apply forensic watermarks with user details (or unique identifiers) to each page of a PDF, helping deter unauthorized distribution while maintaining user privacy.

anti-piracy api digital-watermarking document-security expressjs file-upload javascript multer nodejs pdf-lib pdf-processing pdf-watermark

Last synced: 05 Jul 2025

https://github.com/bhavana1312/smart-doc-assistant

Smart Document Assistant uses RAG and AI to turn PDF documents into interactive quizzes, Q&A, and personalized learning insights.

ai-chatbot edtech fastapi groq learning-analytics llm mongodb pdf-processing qdrant quiz-generator rag react sentence-transformers tailwindcss

Last synced: 04 Jul 2025

https://github.com/msaleh1888/azure-serverless-invoice-extraction

Serverless invoice extraction API using Azure Document Intelligence and Azure Functions. Upload a PDF invoice and receive normalized JSON output including line items, totals, dates, and vendor details.

ai-engineering architecture azure-ai azure-document-intelligence azure-functions backend cloud-engineering cloud-functions document-intelligence form-recognizer http-trigger invoice-processing microservice ocr pdf-processing pdf-to-json python rest-api serverless serverless-architecture

Last synced: 13 Jan 2026

https://github.com/leo-valle/n8n-exam-evaluator

Automated PDF exam grader using n8n + OpenAI. Extracts questions and answers from PDFs, evaluates responses with configurable grading modes (exact/lenient), and generates structured feedback and scores.

ai-grading autograder automation education exam-grader javascript n8n openai pdf-processing systems-engineering workflow

Last synced: 05 Dec 2025

https://github.com/guiss-guiss/scriptumai

RAG Application ScriptumAI is an advanced Retrieval-Augmented Generation platform designed for document ingestion, semantic search, and query processing.

ai document-ingestion document-processing file-upload flask language-model llama llm machine-learning multi-language nlp offline ollama pdf-processing private python rag retrieval-augmented-generation semantic-search text-analysis

Last synced: 28 Mar 2025

https://github.com/jasoncobra3/floorplan-dimractor

A sophisticated Python pipeline for automatically extracting dimensions and cabinet codes from architectural floorplan PDFs. This tool converts various dimension formats into standardized measurements and provides structured output with visualization capabilities.

architecture-tools automation-tools blueprint-analysis cad-automation computer-vision dimension-extraction document-processing document-processing-pipeline floorplan-analysis image-processing measurement-tools opencv pdf-parser pdf-processing pdfplumber pymupdf streamlit text-detection

Last synced: 08 Oct 2025

https://github.com/suraj-k-gupta/pdf-wizard

AI-powered PDF Assistant: Upload PDFs and ask questions about the content with intelligent answers powered by FastAPI and LangChain. Option to check Better Answer for enhanced responses.

ai-chatbot enhanced-answers fastapi langchain natural-language-processing pdf-assistant pdf-processing pdf-wizard question-answering reactjs

Last synced: 27 Oct 2025

https://github.com/nattapolch/work-order-pdf-extractor

AI-powered Work Order PDF Extractor with OpenAI GPT-4 Vision integration for automated text extraction and file organization

ai automation document-processing gui ocr openai pdf-processing python tkinter work-orders

Last synced: 19 Jun 2025

https://github.com/anonymo2239/secure-document-anonymization-system

An academic article review system that anonymizes submissions using NLP and computer vision to ensure fair and unbiased evaluations.

anonymization django nlp opencv pdf-processing spacy

Last synced: 11 Apr 2025

https://github.com/bekohub/llmgenerativeaiopenai

This project is a PDF-based Information Retrieval System powered by LangChain, OpenAI, and Streamlit. The application allows users to upload PDF files, process their contents, and interact with the extracted data using a conversational AI interface. It leverages FAISS for vector-based similarity searches and ChatGPT models (e.g., gpt-4-turbo)

ai-chatbots conversational-ai faiss gpt4 information-retrieval langchain nlp openai pdf-processing python streamlit vector-search

Last synced: 16 Jul 2025

https://github.com/adityaadaki21/fastapi-rag

FASTapi-RAG is a FastAPI-based Retrieval-Augmented Generation system that lets users query PDF documents via an AI-powered chatbot. It integrates Ollama for language generation and ChromaDB for document indexing, offering features like document upload, natural language querying, and an interactive web interface.

chromadb fastapi llm llms ollama pdf-processing rag retrieval-augmented-generation

Last synced: 30 Dec 2025

https://github.com/ahmed-ai-01/multimodal-rag

An AI-powered chat application using text, audio, and images for context-aware responses. It integrates language models and vector databases to enhance retrieval-augmented generation (RAG) capabilities, making it a versatile tool for intelligent conversations.

ai audio-processing chatbot image-processing language-model multimodal pdf-processing pinecone rag streamlit

Last synced: 29 Mar 2025

https://github.com/dagmawi-22/qelem-web

Client side AI-powered study tool that processes PDFs to generate quizzes and flashcards using Gemini AI.

ai pdf-processing question-answering svelte sveltekit

Last synced: 12 May 2025

https://github.com/denesepro/question-extractor

An end-to-end automation tool to extract quiz questions from PDF files using Gemini AI and automatically upload them to biazmoon.com with Selenium.

automation gemini-api pdf-processing pdf-to-json python question-extractor quiz-automation selenium web-automation

Last synced: 10 Oct 2025

https://github.com/valido-app/valido-app.github.io

Official website and download page for Valido - Professional PDF validation and data extraction tool for Windows

automation data-extraction-from-pdf document-processing document-validation pdf pdf-parser pdf-processing pdf-tools

Last synced: 19 Jan 2026

https://github.com/sanafagal/pdf-processor-for-eps-files

A tool designed to process and rename PDF files based on specific EPS configurations, utilizing exact and fuzzy matching techniques to identify file types efficiently.

eps file-rename fuzzy-matching ocrmypdf pdf-processing pypdf2 text-extraction

Last synced: 03 Apr 2025

https://github.com/barandev/emurpg-backend

The 🐲EMU RPG API🐲 supports the EMU RPG Club’s events by managing game tables, players, and D&D character data. Built with FastAPI, it includes features like table/character management, real-time WebSocket updates, data validation, API monitoring, and secure access, providing an organized backend for tabletop RPG sessions.

api asyncio cors csv-processing dungeons-and-dragons fastapi matplotlib medieval-theme moesif mongodb openai pdf-processing pydantic python restful-api websockets

Last synced: 17 Mar 2025