Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jalan/pdftotext
Simple PDF text extraction
https://github.com/jalan/pdftotext
pdf python
Last synced: 2 months ago
JSON representation
Simple PDF text extraction
- Host: GitHub
- URL: https://github.com/jalan/pdftotext
- Owner: jalan
- License: mit
- Created: 2017-04-21T21:50:25.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-05-06T01:32:20.000Z (8 months ago)
- Last Synced: 2024-05-22T05:03:00.771Z (8 months ago)
- Topics: pdf, python
- Language: Python
- Homepage:
- Size: 216 KB
- Stars: 828
- Watchers: 18
- Forks: 100
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE
Awesome Lists containing this project
- awesome-for-oneliner - pdftotext - Simple PDF text extraction (PDF / Open USP Tsukubai)
README
# pdftotext
[![PyPI](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)
[![Tests](https://github.com/jalan/pdftotext/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/jalan/pdftotext/actions)
[![Downloads](https://pepy.tech/badge/pdftotext)](https://pepy.tech/project/pdftotext)Simple PDF text extraction
```python
import pdftotext# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
pdf = pdftotext.PDF(f)# If it's password-protected
with open("secure.pdf", "rb") as f:
pdf = pdftotext.PDF(f, "secret")# How many pages?
print(len(pdf))# Iterate over all the pages
for page in pdf:
print(page)# Read some individual pages
print(pdf[0])
print(pdf[1])# Read all the text into one string
print("\n\n".join(pdf))
```## OS Dependencies
These instructions assume you're using Python 3 on a recent OS. Package names
may differ for Python 2 or for an older OS.### Debian, Ubuntu, and friends
```
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
```### Fedora, Red Hat, and friends
```
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
```### macOS
```
brew install pkg-config poppler python
```### Windows
Currently tested only when using conda:
- Install the Microsoft Visual C++ Build Tools
- Install poppler through conda:
```
conda install -c conda-forge poppler
```## Install
```
pip install pdftotext
```