An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pdf-extractor

A curated list of projects in awesome lists tagged with pdf-extractor .

https://github.com/torakiki/pdfsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

combine extract java javafx merge merge-pdf merger pdf pdf-combiner pdf-extractor pdf-manipulation pdf-merge pdf-mix pdf-rotate pdf-split rotate split split-pdf splitter

Last synced: 13 May 2025

https://github.com/deep-diver/neurips2024

Read and Listen to NeurIPS 2024 Papers

artificial-intelligence gemini llm pdf-extractor vertex-ai

Last synced: 04 Oct 2025

https://github.com/codad5/pdfz

Your Rust PDF Document Text Extractor

pdf pdf-extractor pdfextraction rabbitmq rust

Last synced: 06 Jul 2025

https://github.com/hrbrmstr/fish-stocking-pdf-data-wrangling

🐠A fishy example of how to do PDF data wrangling in R

data-wrangling pdf pdf-extractor r rs

Last synced: 29 Oct 2025

https://github.com/serkodev/camelot-docker

Docker setup of Camelot: PDF Table Extraction

camelot csv docker pdf pdf-converter pdf-extractor

Last synced: 22 Jun 2025

https://github.com/drmccoy/pdftextorizer

Interactively extract text from multi-column PDFs

gui pdf pdf-extractor pdf-files pdf2text pdftotext pyqt5 qt5

Last synced: 07 Jan 2026

https://github.com/guilhermestracini/poc-dotnet-extractpdfcontent

🔬 Proof of Concept of extracting content from PDF files using multiple PDF libraries

docnet dotnet dotnetcore itextsharp pdf-extractor pdf-reader pdfextraction pdfpig pdfsharp poc prdreader proof-of-concept

Last synced: 09 Oct 2025

https://github.com/renan-siqueira/python-pdf-tool

This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporting both single PDF file and directory containing multiple PDF files.

mit-license pdf pdf-extractor pdf-to-text pdfminer pdfplumber pymupdf pypdf2 python

Last synced: 01 Apr 2025

https://github.com/ilejuxepwaduzd/structured-data-extractor

🛠️ Extract structured data from messy texts using Chain-of-Thought prompting to improve processing of customer support and technical issues.

cdp chrome-fetcher data document-extraction ecommerce golang-library headless metadata-extraction ocr open-source pdf pdf-converter pdf-extractor ruby scraper shopify spider structured-data

Last synced: 09 Oct 2025

https://github.com/cllspy/pypte

The PDF Text Extractor API allows users to upload PDF files and receive the extracted text from those files. This API is built using FastAPI and leverages the PyMuPDF library for efficient text extraction.

fastapi pdf-extractor pdf-to-text postman python

Last synced: 15 Mar 2025

https://github.com/merrvve/pdf-image-extract

Command-line tool to extract and save images (JPEG, PNG) from a PDF file or all PDFs in a directory based on the specific byte signatures.

command-line-tool pdf-extractor pdf-image-extractor python

Last synced: 20 Jul 2025

https://github.com/hermesroot/doceru-pdf-extractor

Extensão leve e prática para extrair e baixar PDFs do Doceru.com com um clique!

browser chrome-extension extension extraction javascript pdf pdf-extractor web-tool

Last synced: 17 Mar 2025

https://github.com/arthur-mdn/extractjsflippdfplusprotopdf

JavaScript extraction tool for FlipPDFPlusPro/FlipBuilder digital flipbooks. Converts interactive page-by-page viewers into downloadable PDF documents.

book digital-magazine extract flipbooks flipbuilder flippdfpluspro jsbook pdf pdf-extractor pdfbook

Last synced: 30 Oct 2025

https://github.com/petermosmans/apdfhelper

Fix links in PDF files, rewrite links, extract text annotations, remove pages

annotations calendar pdf pdf-converter pdf-extractor pdf-parser planner

Last synced: 16 Mar 2025

https://github.com/psilvautomata/automated_pdf_data_processing

Data automation and processing tool designed to streamline the extraction and analysis of data from PDF's documents using MS Power Automate Desktop and Excel VBA.

pdf pdf-data-extraction pdf-extractor powerautomate powerautomatedesktop vba vba-excel

Last synced: 03 Jan 2026

https://github.com/xiaoyao9184/docker-magic

Docker implementation of the MinerU pdf to markdown

cuda-support docker-image markdown-export mineru pdf-extractor

Last synced: 22 Feb 2025

https://github.com/eccenca/cmem-plugin-pdf-extract

Extract text and tables from PDF files

corporate-memory eccenca pdf-extractor plugin

Last synced: 28 Jul 2025

https://github.com/xiaoyao9184/docker-marker

Docker implementation of the Marker pdf to markdown

cuda-support docker-image markdown-export marker ocr pdf-extractor

Last synced: 13 Nov 2025