https://github.com/pixelastic/pietro
π Utilities to split PDF files into smaller files, generate thumbnails and extract textual content.
https://github.com/pixelastic/pietro
Last synced: 11 months ago
JSON representation
π Utilities to split PDF files into smaller files, generate thumbnails and extract textual content.
- Host: GitHub
- URL: https://github.com/pixelastic/pietro
- Owner: pixelastic
- License: mit
- Created: 2018-10-19T07:35:00.000Z (over 7 years ago)
- Default Branch: main
- Last Pushed: 2025-02-26T13:10:39.000Z (over 1 year ago)
- Last Synced: 2025-02-26T14:24:17.144Z (over 1 year ago)
- Language: JavaScript
- Homepage:
- Size: 3.57 MB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# π Pietro
Set of utilities to deal with PDF files: splitting into individual pages,
extracting text and images.
This modules relies on CLI binaries (qpdf, imagemagick, poppler-utils, etc), so
it provide good interoperability, it runs every command from inside a Docker
container.
It will be slower than other implementation, but it is portable.
_This package has been created out of the need to centralize common pattern of
PDF handling I needed for various projects. The way it names files and folders
makes sense to me, they might not suit you._
```js
import { extractPages, extractImages, extractText } from 'pietro';
// Extract each page of the pdf into its own file
await extractPages('./bigFile.pdf', './pages');
// ./pages/
// ./pages/001.pdf
// ./pages/...
// ./pages/401.pdf
// Extract the text of a given pdf into a txt file
await extractText('./pages/001.pdf', './text/001.txt');
// ./text/
// ./text/001.txt
// Extract all images from a given pdf file.
// ./raw contains ALL images, including presentational ones and masks
// ./illustrations only contains "real" illustration images
await extractImages('./pages/001.pdf', ./images');
// ./images/
// ./images/001/
// ./images/001/raw/
// ./images/001/raw/001.png
// ./images/001/raw/...
// ./images/001/raw/227.png
// ./images/001/illustrations/
// ./images/001/illustrations/1.png
// ./images/001/illustrations/...
// ./images/001/illustrations/5.png
```