Projects in Awesome Lists tagged with pdf-processing
A curated list of projects in awesome lists tagged with pdf-processing .
https://github.com/dissorial/doc-chatbot
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
chat chatbot document-embedding gpt-3 gpt-4 langchain mongoose nextjs openai openai-api pdf-processing pinecone reactjs tailwindcss typescript vectorization
Last synced: 04 Apr 2025
https://github.com/allenai/papermage
library supporting NLP and CV research on scientific papers
computer-vision machine-learning multimodal natural-language-processing pdf-processing python scientific-papers
Last synced: 13 Oct 2025
https://github.com/ahmedkhemiri95/PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
data-science extract-text parser pdf pdf-document pdf-processing pdfminer pdfs pdfs-textextract pypdf2 python text-analytics
Last synced: 04 Apr 2025
https://github.com/pspdfkit/nutrient-dws-client-python
Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion
ocr-python pdf-converter pdf-document-processor pdf-generation pdf-processing python
Last synced: 05 Sep 2025
https://github.com/inc44/matools
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
application audio-processing code-formatting file-management gui image-processing ocr pdf-processing productivity python qt rust speech-recognition video-processing youtube-downloader
Last synced: 14 Oct 2025
https://github.com/pdftl/pdftl
pdftl: A modern Python-based successor to pdftk. A CLI tool for PDF manipulation, merging, and geometry (crop/chop) powered by pikepdf and qpdf.
cli-app pdf pdf-manipulation pdf-processing pdf-python pdftk-alternative pdftl pikepdf python-pdf qpdf
Last synced: 08 Feb 2026
https://github.com/rithulkamesh/docproc
Opinionated and Sophisticated Document Region Analyzer.
content-extraction data-extraction document-analysis document-parsing equation-detection layout-analysis machine-learning mathematical-symbols ocr pdf-processing pdf-text-extraction python region-detection text-classification text-extraction
Last synced: 18 Jan 2026
https://github.com/arsath-eng/rag1-nvidia-genai
A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.
document-analysis embeddings faiss langchain llama-models llm nvidia-ai-faundry pdf-processing question-answering rag streamlit vector-store
Last synced: 27 Oct 2025
https://github.com/wesellis/tech-adobe-enterprise-automation-powershell-python-portfolio
[40% Complete] Architecture template for Adobe CC enterprise automation. Well-structured skeleton with directories, configs, and infrastructure. README describes features not yet implemented. Only ~2 PS scripts exist. Reference architecture, not production suite.
adobe-creative-cloud azure docker enterprise-automation kubernetes license-management machine-learning nodejs pdf-processing powershell python rest-api servicenow-integration terraform user-provisioning
Last synced: 06 Oct 2025
https://github.com/al-shwaib/book-preparation-for-printing
A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.
a3-printing arabic-books book-preparation commercial-printing flask-application offset-printing order-to-print pdf-processing pymupdf rtl-support
Last synced: 03 Feb 2026
https://github.com/fahadelahikhan/ghostscript-pdf-compression
Python Scripts to compress PDF files using Ghostscript.
filecompression ghostscript pdf pdf-compression pdf-compression-and-optimization pdf-compressor pdf-processing python
Last synced: 15 Jul 2025
https://github.com/mohamedelareeg/imageautomaticcroppingwatcher
Image Automatic Cropping Watcher: A tool that automatically detects PDF files, converts them to images, corrects perspective distortion, and compiles them back into PDFs.
ai autoskew itextsharp json opencv pdf pdf-generation pdf-processing
Last synced: 02 Mar 2025
https://github.com/diocrafts/ai-book-summarizer
📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
ai ai-powered-tools automation book-summary document-analysis educational-tools knowledge-extraction machine-learning markdown natural-language-processing openai pdf pdf-processing pdf-summarization pymupdf python study-materials text-analysis text-summarization
Last synced: 14 Jul 2025
https://github.com/hemaldholakiya12/pdfchat
A web app that allows users to upload PDFs and interact with them through a Q&A interface. The application extracts text from PDFs, generates embeddings, stores them in a FAISS database, and retrieves relevant information to provide context-aware answers using a large language model .
ai api cors embeddings faiss fastapi groq huggingface langchain llama3 llm pdf pdf-processing pymupdf python question-answering semantic-search text-splitting transformers vector-store
Last synced: 30 Oct 2025
https://github.com/9-5/chromium-intelligence
A powerful Chromium extension that leverages the multiple AI APIs to assist with various text operations, image analysis, and PDF processing.
ai-assistant browser-automation browser-tools chrome-extension content-analysis custom-prompts gemini-api image-analysis manifest-v3 natural-language-processing pdf-processing productivity proofreading text-processing text-summarization tone-adjustment
Last synced: 12 May 2025
https://github.com/noorjotk/local-rag-engine
Local RAG app with zero-config Docker setup. FastAPI + Streamlit + Qdrant + Ollama. Just run `docker-compose up --build`! 🚀
docker fastapi llm local-ai local-llm local-ollama ollama pdf-processing privacy-focused python qdrant qdrant-vector-database rag semantic-search streamlit vector-database
Last synced: 09 Sep 2025
https://github.com/furqanhun/textnomnom-py
Extract text from PDFs, PPTs, & URLs (with OCR support). Converts PPT to PDF & handles files or folders. 🦍
automated-conversion automation cross-platform document-conversion image-text-extraction linux pdf-processing pdf-to-text ppt ppt-to-text pptx pptx-to-text text-extraction windows
Last synced: 23 Mar 2025
https://github.com/farhaj499/rag_with_weaviate_db
This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on the PDF document. It utilizes Weaviate as a vector database for efficient retrieval of relevant information and Gemini to generate natural language responses.
agentic-ai embeddings huggingface-transformers langchain pdf-processing python rag retrieval-augmented-generation semantic-search vector-database weaviate
Last synced: 23 Mar 2025
https://github.com/king04aman/pdf-extractor-api
PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.
api-security docker-compose doker fastapi invoice-management invoice-pdf jwt-auth jwt-authentication jwt-token pdf-processing pdf-processor python python3 rate-limiting sap
Last synced: 18 Jan 2026
https://github.com/omritriki/biu-points-calculator
A web application for calculating credit points and GPA from PDF transcripts. Built with FastAPI and pdfplumber, this tool simplifies the process for BIU engineering students.
api biu credit-points css education fastapi gpa-calculation-tool html pdf-processing python render university-tools web-application
Last synced: 09 Oct 2025
https://github.com/khushirajurkar/medical-document-summarizer
An interactive pipeline that extracts and summarizes clinical trial PDFs using advanced NLP models
clinical-trials document-summarization medical-nlp pdf-processing transformers
Last synced: 19 Jan 2026
https://github.com/uzairsayyed-005/docuchat-ai
DocuChat-AI is an AI-powered document interaction assistant that transforms static PDFs into conversational partners. It leverages Retrieval-Augmented Generation (RAG), history-aware memory, and advanced NLP to enable natural language Q&A, contextual dialogue, and secure local document processing.
ai conversational-ai document-analysis fiass groq history-aware huggingface langchain local-processing pdf-processing rag streamlit
Last synced: 28 Mar 2025
https://github.com/souravupadhyay7/morvs_chat_bot
🤖 MORVS AI - An intelligent chat interface powered by Groq's LLaMA 3 model with PDF processing capabilities. Built with Next.js, React, TypeScript, and modern UI components.
ai-assistant ai-chatbot chat-interface conversational-ai cyberpunk-ui framer-motion groq nextjs pdf-parser pdf-processing real-time-chat shadcn-ui tailwindcss typescript
Last synced: 18 Aug 2025
https://github.com/khushikumarigupta14/pdf-mcq-extractor
PDF MCQ Extractor – Quickly extract multiple-choice questions from PDFs and export them as structured JSON. Perfect for educators, students, and study apps.
educational-tools machine-readable-mcqs mcq-extraction pdf-processing pdf-to-json study-materials
Last synced: 16 Sep 2025
https://github.com/faerque/pdf_scraper
PDF Scraper with Automation - A CLI tool for extracting text from PDFs and storing it in an SQLite database for structured querying. Supports digitally generated PDFs and enables efficient document processing.
automation cli-tool document-management document-management-system natural-language-processing pdf-processing sqlite text-extraction
Last synced: 12 Apr 2025
https://github.com/nathania-rachael/chat-with-multiple-pdfs
An AI-powered chatbot that lets users upload multiple PDFs and ask questions based on their content. It extracts text, processes it with FAISS, and retrieves answers using Google Generative AI (Gemini Pro) through a simple Streamlit interface.
faiss gemini-pro google-generative-ai langchain pdf-chat-bot pdf-processing python streamlit
Last synced: 03 Mar 2025
https://github.com/jcaperella29/ai_llm_set_up
AI-powered research paper summarization using local LLMs (Ollama). Extracts, processes, and summarizes PDFs with structured insights. Ideal for scientific papers & bioinformatics
ai llm machine-learning nlp ollama pdf-processing python research
Last synced: 22 Feb 2025
https://github.com/fujiba/pdf-chunker
LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and downsample images for RAG/Bedrock.
aws-bedrock chunking claude cli image-optimization llm pdf pdf-chunker pdf-processing pdf-splitting python rag token-optimization
Last synced: 13 Jan 2026
https://github.com/adudhe01/pdfreaderchatbot
A chatbot app using Streamlit, LangChain, and OpenAI to interact with uploaded PDFs, extract text, and answer questions based on the document content.
chatbot conversational-ai faiss langchain openai pdf-processing streamlit text-extraction vector-store
Last synced: 30 Dec 2025
https://github.com/omkarspace/synthesis
AI-powered research paper generation and analysis platform. Automatically generates comprehensive research papers from uploaded documents using multi-agent orchestration with Gemini AI, featuring PDF/DOCX export, concept visualization, and intelligent Q&A.
academic-writing agent-orchestration ai gemini-api generation machine-learning next-js nlp pdf-processing research-paper research-tool typescript
Last synced: 27 Dec 2025
https://github.com/ungerik/pdfzig
A fast, cross-platform PDF utility tool written in Zig, powered by PDFium
pdf pdf-processing pdf-rendering pdfium zig
Last synced: 22 Jan 2026
https://github.com/abishek7952/resume-classifier
An end-to-end machine learning web app that classifies PDF resumes into job-fit categories. Built with FastAPI, Streamlit & Docker. Deployed on Render.
docker end-to-end-ml-workflows fastapi machine-learning nlp pdf-processing portfolio-project render resume-classfier streamlit
Last synced: 18 Apr 2025
https://github.com/ojas1584/contract_analysis
Automated legal contract analysis using a local LLM and a 4-call, high-accuracy RAG pipeline with semantic validation.
clause-extraction contract-analysis embeddings faiss langchain legal-ai legal-tech llm ollama pdf-processing pdf-processing-nlp-ai-python-pdf-search-automation rag semantic-search
Last synced: 22 Nov 2025
https://github.com/terry-li-hm/prometheus
PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude
ai-tools claude-code document-processing fastmcp mcp-server pdf-processing pdf-splitter prometheus pymupdf python text-extraction
Last synced: 03 Sep 2025
https://github.com/anshi312/financial-analyst-rag
Financial QA assistant using RAG, FAISS, LangChain & HuggingFace to query 10-Ks & reports. PDF search, NLP, and Streamlit UI for insights.
document-ai faiss financial-analysis huggingface langchain nlp pdf-processing rag semantic-search sentence-transformers streamlit
Last synced: 03 Jul 2025
https://github.com/ujjwalsaini07/ollamamulti-rag
OllamaMulti-RAG 🚀 is a multimodal AI chat app combining Whisper AI for audio, LLaVA for images, and Chroma DB for PDFs, enhanced with Ollama and OpenAI API. 📄 Built for AI enthusiasts, it welcomes contributions—features, bug fixes, or optimizations—to advance practical multimodal AI research and development collaboratively.
ai-chatbot audio-transcription chat-application image-understanding knowledge-retrieval langchain llava-llama3 llm multimodal-ai ollama openai pdf-processing rag trend trending-topics vector-database whisper
Last synced: 30 Dec 2025
https://github.com/sigmakib2/express-pdf-watermark-api
A simple Express.js API for embedding watermarks on PDF files using pdf-lib and multer. This project demonstrates how to apply forensic watermarks with user details (or unique identifiers) to each page of a PDF, helping deter unauthorized distribution while maintaining user privacy.
anti-piracy api digital-watermarking document-security expressjs file-upload javascript multer nodejs pdf-lib pdf-processing pdf-watermark
Last synced: 05 Jul 2025
https://github.com/bhavana1312/smart-doc-assistant
Smart Document Assistant uses RAG and AI to turn PDF documents into interactive quizzes, Q&A, and personalized learning insights.
ai-chatbot edtech fastapi groq learning-analytics llm mongodb pdf-processing qdrant quiz-generator rag react sentence-transformers tailwindcss
Last synced: 04 Jul 2025
https://github.com/itvincent-git/invoice_renamer
A tool to rename invoices based on their content
chinese-ocr document-automation file-renaming invoice-management invoice-processing ocr paddleocr pdf-processing pdf-to-image python
Last synced: 11 Apr 2025
https://github.com/msaleh1888/azure-serverless-invoice-extraction
Serverless invoice extraction API using Azure Document Intelligence and Azure Functions. Upload a PDF invoice and receive normalized JSON output including line items, totals, dates, and vendor details.
ai-engineering architecture azure-ai azure-document-intelligence azure-functions backend cloud-engineering cloud-functions document-intelligence form-recognizer http-trigger invoice-processing microservice ocr pdf-processing pdf-to-json python rest-api serverless serverless-architecture
Last synced: 13 Jan 2026
https://github.com/leo-valle/n8n-exam-evaluator
Automated PDF exam grader using n8n + OpenAI. Extracts questions and answers from PDFs, evaluates responses with configurable grading modes (exact/lenient), and generates structured feedback and scores.
ai-grading autograder automation education exam-grader javascript n8n openai pdf-processing systems-engineering workflow
Last synced: 05 Dec 2025
https://github.com/guiss-guiss/scriptumai
RAG Application ScriptumAI is an advanced Retrieval-Augmented Generation platform designed for document ingestion, semantic search, and query processing.
ai document-ingestion document-processing file-upload flask language-model llama llm machine-learning multi-language nlp offline ollama pdf-processing private python rag retrieval-augmented-generation semantic-search text-analysis
Last synced: 28 Mar 2025
https://github.com/jasoncobra3/floorplan-dimractor
A sophisticated Python pipeline for automatically extracting dimensions and cabinet codes from architectural floorplan PDFs. This tool converts various dimension formats into standardized measurements and provides structured output with visualization capabilities.
architecture-tools automation-tools blueprint-analysis cad-automation computer-vision dimension-extraction document-processing document-processing-pipeline floorplan-analysis image-processing measurement-tools opencv pdf-parser pdf-processing pdfplumber pymupdf streamlit text-detection
Last synced: 08 Oct 2025
https://github.com/suraj-k-gupta/pdf-wizard
AI-powered PDF Assistant: Upload PDFs and ask questions about the content with intelligent answers powered by FastAPI and LangChain. Option to check Better Answer for enhanced responses.
ai-chatbot enhanced-answers fastapi langchain natural-language-processing pdf-assistant pdf-processing pdf-wizard question-answering reactjs
Last synced: 27 Oct 2025
https://github.com/nattapolch/work-order-pdf-extractor
AI-powered Work Order PDF Extractor with OpenAI GPT-4 Vision integration for automated text extraction and file organization
ai automation document-processing gui ocr openai pdf-processing python tkinter work-orders
Last synced: 19 Jun 2025
https://github.com/anonymo2239/secure-document-anonymization-system
An academic article review system that anonymizes submissions using NLP and computer vision to ensure fair and unbiased evaluations.
anonymization django nlp opencv pdf-processing spacy
Last synced: 11 Apr 2025
https://github.com/bekohub/llmgenerativeaiopenai
This project is a PDF-based Information Retrieval System powered by LangChain, OpenAI, and Streamlit. The application allows users to upload PDF files, process their contents, and interact with the extracted data using a conversational AI interface. It leverages FAISS for vector-based similarity searches and ChatGPT models (e.g., gpt-4-turbo)
ai-chatbots conversational-ai faiss gpt4 information-retrieval langchain nlp openai pdf-processing python streamlit vector-search
Last synced: 16 Jul 2025
https://github.com/adityaadaki21/fastapi-rag
FASTapi-RAG is a FastAPI-based Retrieval-Augmented Generation system that lets users query PDF documents via an AI-powered chatbot. It integrates Ollama for language generation and ChromaDB for document indexing, offering features like document upload, natural language querying, and an interactive web interface.
chromadb fastapi llm llms ollama pdf-processing rag retrieval-augmented-generation
Last synced: 30 Dec 2025
https://github.com/ahmed-ai-01/multimodal-rag
An AI-powered chat application using text, audio, and images for context-aware responses. It integrates language models and vector databases to enhance retrieval-augmented generation (RAG) capabilities, making it a versatile tool for intelligent conversations.
ai audio-processing chatbot image-processing language-model multimodal pdf-processing pinecone rag streamlit
Last synced: 29 Mar 2025
https://github.com/dagmawi-22/qelem-web
Client side AI-powered study tool that processes PDFs to generate quizzes and flashcards using Gemini AI.
ai pdf-processing question-answering svelte sveltekit
Last synced: 12 May 2025
https://github.com/denesepro/question-extractor
An end-to-end automation tool to extract quiz questions from PDF files using Gemini AI and automatically upload them to biazmoon.com with Selenium.
automation gemini-api pdf-processing pdf-to-json python question-extractor quiz-automation selenium web-automation
Last synced: 10 Oct 2025
https://github.com/valido-app/valido-app.github.io
Official website and download page for Valido - Professional PDF validation and data extraction tool for Windows
automation data-extraction-from-pdf document-processing document-validation pdf pdf-parser pdf-processing pdf-tools
Last synced: 19 Jan 2026
https://github.com/sanafagal/pdf-processor-for-eps-files
A tool designed to process and rename PDF files based on specific EPS configurations, utilizing exact and fuzzy matching techniques to identify file types efficiently.
eps file-rename fuzzy-matching ocrmypdf pdf-processing pypdf2 text-extraction
Last synced: 03 Apr 2025
https://github.com/barandev/emurpg-backend
The 🐲EMU RPG API🐲 supports the EMU RPG Club’s events by managing game tables, players, and D&D character data. Built with FastAPI, it includes features like table/character management, real-time WebSocket updates, data validation, API monitoring, and secure access, providing an organized backend for tabletop RPG sessions.
api asyncio cors csv-processing dungeons-and-dragons fastapi matplotlib medieval-theme moesif mongodb openai pdf-processing pydantic python restful-api websockets
Last synced: 17 Mar 2025