Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sckott/pdfimager
Extract images from pdfs using the pdfimages tool from poppler https://poppler.freedesktop.org/
https://github.com/sckott/pdfimager
pdf pdfimages poppler r rstats
Last synced: 7 days ago
JSON representation
Extract images from pdfs using the pdfimages tool from poppler https://poppler.freedesktop.org/
- Host: GitHub
- URL: https://github.com/sckott/pdfimager
- Owner: sckott
- License: other
- Created: 2020-08-20T21:43:39.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-09-12T13:27:26.000Z (about 2 months ago)
- Last Synced: 2024-10-09T15:45:11.672Z (29 days ago)
- Topics: pdf, pdfimages, poppler, r, rstats
- Language: R
- Homepage: https://sckott.github.io/pdfimager/
- Size: 2.57 MB
- Stars: 29
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- jimsghstars - sckott/pdfimager - Extract images from pdfs using the pdfimages tool from poppler https://poppler.freedesktop.org/ (R)
README
pdfimager
=========```{r echo=FALSE}
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
collapse = TRUE,
comment = "#>"
)
```[![R-check](https://github.com/sckott/pdfimager/workflows/R-check/badge.svg)](https://github.com/sckott/pdfimager/actions/)
`pdfimager` - Extract images from pdfs
Docs:
This packages uses `sys` R package to "shell out" to pdfimages. Apparently pdfimages is not in poppler cpp, so is not in pdftools R pkg
## Install pdfimages
pdfimages is installed when you install poppler
Installation instructions can be found at
## Install pdfimager
```{r eval=FALSE}
# install.packages("pak")
pak::pak("sckott/pdfimager")
``````{r}
library("pdfimager")
```## Set the path
Some users may need to manually set the path to `pdfimages`.
You can do so with a function in this package like
```r
pdimg_set_path()
```or set the path for pdfimages before starting R with an env var like:
```
PDFIMAGER_PATH=C:/some/path/to/poppler/24/bin/pdfimages.exe R
```
Or set within R like:```r
Sys.setenv(PDFIMAGER_PATH="C:/some/path/to/poppler/24/bin/pdfimages.exe")
```## help info
```{r}
pdimg_help()
```## pdf image metadata
```{r}
x <- system.file("examples/BachmanEtal2020.pdf", package="pdfimager")
pdimg_meta(x)
```## pdf images
```{r}
x <- system.file("examples/BachmanEtal2020.pdf", package="pdfimager")
pdimg_images(x)
```## filter images
does a variety of thing to filter images by their metadata, some are configureable
```{r}
x1 <- system.file("examples/Tierney2017JOSS.pdf", package="pdfimager")
x2 <- system.file("examples/vanGemert2018.pdf", package="pdfimager")
res <- pdimg_images(c(x1, x2))
vapply(res, NROW, 1)
out <- pdimg_filter(res)
vapply(out, NROW, 1)
```## Meta
* Please [report any issues or bugs](https://github.com/sckott/pdfimager/issues)
* License: MIT
* Please note that the pdfimager project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.