Projects in Awesome Lists tagged with pdf2image
A curated list of projects in awesome lists tagged with pdf2image .
https://github.com/mehmet-kozan/pdf-parse
Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run 🤗 directly in your browser or in Node.js
pdf pdf-parse pdf-parser pdf-screenshot pdf-table pdf-thumbnail pdf-to-image pdf-to-text pdf-tools pdf-utils pdf-viewer pdf2image pdf2json pdf2pic pdf2text pdfjs pdfjs-dist turkey
Last synced: 31 Jan 2026
https://github.com/icaropires/pdf2dataset
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
data-science distributed-computing distributed-systems ocr pandas-dataframe parallel parquet pdf pdf2image pdftotext pyarrow pytesseract pytesseract-ocr python python3 ray tesseract tesseract-ocr
Last synced: 13 Apr 2025
https://github.com/deathking/pico
convert PDF to images with simple API and progress bar support.
cmd golang golang-library pdf-to-image pdf2image
Last synced: 14 Apr 2025
https://github.com/sdpdas/document-layout-generator-and-segmentation-tool
Lists all parts of a document PDF and is a highly scalable with robust code.
analysis document-classification numpy opencv-python pdf2image python
Last synced: 15 Apr 2025
https://github.com/chandansoren/medical-data-extraction
Medical data extraction from medical documents like prescription and patient details document using python and Regex
fastapi ocr-recognition opencv-python pdf2image pytesseract python regular-expression
Last synced: 06 May 2026
https://github.com/thearchitector/styled-prose
Generate images and thumbnails based on bitmap transformations of rendered prose
art fonts image pdf pdf2image pillow prose render text tgf the-glass-files typography writing
Last synced: 10 May 2026
https://github.com/csoren66/medical-data-extraction
Medical data extraction from medical documents like prescription and patient details document using python and Regex
fastapi ocr-recognition opencv-python pdf2image pytesseract python regular-expression
Last synced: 30 Dec 2025
https://github.com/hansputera/ilovepdf-freedom
Use ilovepdf.com API without API KEY
ilovepdf ilovepdf-api pdf2image word2pdf word2pdf-converter
Last synced: 04 Apr 2025
https://github.com/yjg30737/pyqt-pdf2text
Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2
ocr pdf-converter pdf2image pypdf2 pyqt pyqt5 pytesseract tesseract
Last synced: 10 Jul 2025
https://github.com/erthium/converter
Convertions in all kind to all formats
moviepy pdf2image pillow pillow-library pypdf2 python python3 reportlab reportlab-pdf
Last synced: 04 Mar 2025
https://github.com/ascender1729/vodafone-financial-analysis
Automated financial table extraction and standardization from Vodafone's annual report using GPT-4o-mini
automation balance-sheet crediflow-ai csv financial-analysis financial-tables gpt-4o-mini machine-learning ocr openai pandas pdf-extraction pdf2image pypdf2 pytesseract standardization striprtf vodafone
Last synced: 19 May 2026
https://github.com/thekartikeyamishra/automated-invoice
Welcome to Automated Invoice Processing! This project is designed to help small businesses and MSMEs efficiently process invoices by extracting key information (e.g., invoice number, date, total amount) using OCR (Optical Character Recognition) technology.
artificial-intelligence machine-learning ocr opencv-python pdf2image pillow pytesseract python
Last synced: 17 May 2026
https://github.com/hase3b/scprag
This repository implements a Retrieval-Augmented Generation (RAG) system for the Supreme Court of Pakistan, utilizing different LLMs, embedding models, and retrieval and generation enhancement strategies. It processes SCP judgments, applies chunking, and generates legal summaries and answers based on relevant case data.
beautifulsoup4 embedding-models huggingface langchain legal-corpus llama llm mistral nlp ocr pdf2image pinecone pymupdf pytesseract regex retreival retrieval-augmented-generation selenium vectorstore
Last synced: 06 Apr 2026
https://github.com/shamspias/dify-pdf-image-converter
Convert PDF pages into high-quality images with customizable format, DPI, and quality settings.
dify dify-plugins pdf2image pdftoimage
Last synced: 16 Oct 2025
https://github.com/homebackend/pdf-title-page-splitter
Splits a pdf based on identified title pages using ML trained model
machine-learning opencv pdf-splitter pdf2image pypdf2 scikit-learn tensorflow
Last synced: 04 May 2026