An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pdf2image

A curated list of projects in awesome lists tagged with pdf2image .

https://github.com/mehmet-kozan/pdf-parse

Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run 🤗 directly in your browser or in Node.js

pdf pdf-parse pdf-parser pdf-screenshot pdf-table pdf-thumbnail pdf-to-image pdf-to-text pdf-tools pdf-utils pdf-viewer pdf2image pdf2json pdf2pic pdf2text pdfjs pdfjs-dist turkey

Last synced: 31 Jan 2026

https://github.com/icaropires/pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

data-science distributed-computing distributed-systems ocr pandas-dataframe parallel parquet pdf pdf2image pdftotext pyarrow pytesseract pytesseract-ocr python python3 ray tesseract tesseract-ocr

Last synced: 13 Apr 2025

https://github.com/deathking/pico

convert PDF to images with simple API and progress bar support.

cmd golang golang-library pdf-to-image pdf2image

Last synced: 14 Apr 2025

https://github.com/sdpdas/document-layout-generator-and-segmentation-tool

Lists all parts of a document PDF and is a highly scalable with robust code.

analysis document-classification numpy opencv-python pdf2image python

Last synced: 15 Apr 2025

https://github.com/chandansoren/medical-data-extraction

Medical data extraction from medical documents like prescription and patient details document using python and Regex

fastapi ocr-recognition opencv-python pdf2image pytesseract python regular-expression

Last synced: 06 May 2026

https://github.com/thearchitector/styled-prose

Generate images and thumbnails based on bitmap transformations of rendered prose

art fonts image pdf pdf2image pillow prose render text tgf the-glass-files typography writing

Last synced: 10 May 2026

https://github.com/csoren66/medical-data-extraction

Medical data extraction from medical documents like prescription and patient details document using python and Regex

fastapi ocr-recognition opencv-python pdf2image pytesseract python regular-expression

Last synced: 30 Dec 2025

https://github.com/hansputera/ilovepdf-freedom

Use ilovepdf.com API without API KEY

ilovepdf ilovepdf-api pdf2image word2pdf word2pdf-converter

Last synced: 04 Apr 2025

https://github.com/yjg30737/pyqt-pdf2text

Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2

ocr pdf-converter pdf2image pypdf2 pyqt pyqt5 pytesseract tesseract

Last synced: 10 Jul 2025

https://github.com/thekartikeyamishra/automated-invoice

Welcome to Automated Invoice Processing! This project is designed to help small businesses and MSMEs efficiently process invoices by extracting key information (e.g., invoice number, date, total amount) using OCR (Optical Character Recognition) technology.

artificial-intelligence machine-learning ocr opencv-python pdf2image pillow pytesseract python

Last synced: 17 May 2026

https://github.com/hase3b/scprag

This repository implements a Retrieval-Augmented Generation (RAG) system for the Supreme Court of Pakistan, utilizing different LLMs, embedding models, and retrieval and generation enhancement strategies. It processes SCP judgments, applies chunking, and generates legal summaries and answers based on relevant case data.

beautifulsoup4 embedding-models huggingface langchain legal-corpus llama llm mistral nlp ocr pdf2image pinecone pymupdf pytesseract regex retreival retrieval-augmented-generation selenium vectorstore

Last synced: 06 Apr 2026

https://github.com/shamspias/dify-pdf-image-converter

Convert PDF pages into high-quality images with customizable format, DPI, and quality settings.

dify dify-plugins pdf2image pdftoimage

Last synced: 16 Oct 2025

https://github.com/homebackend/pdf-title-page-splitter

Splits a pdf based on identified title pages using ML trained model

machine-learning opencv pdf-splitter pdf2image pypdf2 scikit-learn tensorflow

Last synced: 04 May 2026