https://github.com/oelin/straw

Extract images and other streams from PDF files 🐍.
https://github.com/oelin/straw

image-extractor pdf python

Last synced: 8 months ago
JSON representation

Extract images and other streams from PDF files 🐍.

Host: GitHub
URL: https://github.com/oelin/straw
Owner: oelin
License: mit
Created: 2023-01-25T14:22:35.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-01-25T14:29:12.000Z (over 3 years ago)
Last Synced: 2025-03-12T05:33:01.877Z (about 1 year ago)
Topics: image-extractor, pdf, python
Language: Python
Homepage:
Size: 4.88 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Straw

Extract images and other streams from PDF files 🐍.

## Introduction

This library is dedicated to implementing extraction procedures for binary streams in PDF files. Currently this only includes images, however we hope to expand to other streams, and indeed other file formats.

## Installation

```sh

$ pip install straw

```

## API

```py

>>> import straw

>>> images = straw.extract_images('./file.pdf')

```

The `extract_images` method returns an array of image instances, each having a `data` attribute which holds the raw binary data of the image. The image can be saved to disk by simply writing this data to a file. 

## Future

* Pillow integration

* Support for stream formats other than images

* Support for other file formats with embedded streams.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oelin/straw

Awesome Lists containing this project

README