https://github.com/expersso/pdftables
Programmatic Conversion of PDF Tables With R
https://github.com/expersso/pdftables
Last synced: 5 months ago
JSON representation
Programmatic Conversion of PDF Tables With R
- Host: GitHub
- URL: https://github.com/expersso/pdftables
- Owner: expersso
- Created: 2016-02-10T09:15:27.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2016-08-15T09:13:56.000Z (over 8 years ago)
- Last Synced: 2024-11-02T13:03:09.909Z (6 months ago)
- Language: R
- Homepage:
- Size: 119 KB
- Stars: 40
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
- jimsghstars - expersso/pdftables - Programmatic Conversion of PDF Tables With R (R)
README
---
output:
md_document:
variant: markdown_github
---# pdftables
[](http://cran.r-project.org/web/packages/pdftables)
[](https://travis-ci.org/expersso/pdftables)
[](http://cran.r-project.org/web/packages/pdftables)```{r options, echo=FALSE}
knitr::opts_chunk$set(cache = FALSE, warning = FALSE, error = FALSE,
fig.path = "", eval=FALSE)
```### Introduction
The `pdftables` package allows the user to convert PDF tables to formats more
amenable to analysis (csv, XML, or XLSX) by wrapping the
[PDFTables API](https://pdftables.com).The package can be installed from either CRAN or Github (development version):
```{r install}
# From CRAN
install.packages("pdftables")# From Github
library(devtools)
install_github("expersso/pdftables")library(pdftables)
```To use the package the user first needs to sign up to the
[PDFTables API](https://pdftables.com/join) to get an API token (they offer a
free package that allows up to 50 pages).### Usage Example
In the following example we first write the `iris` dataset to a `.csv` file. We
then open that file and print it as a `.pdf` file. Using the `convert_pdf`
function we then upload that PDF to the [PDFTables API](https://pdftables.com)
which parses and returns the converted file as `test2.csv`.(Note: All functions in the package require that you provide your api key in the
`api_key` argument. By default this looks for an environment variable called
`pdftable_api`, but you can also provide it directly.)```{r}
write.csv(head(iris, 20), file = "test.csv", row.names = FALSE)# Open test.csv and print as PDF to "test.pdf"
convert_pdf("test.pdf", "test2.csv")
# Converted test.pdf to test2.csv
```
If the `output_file` argument is omitted, the name of the output file will be
the same as the input file, but with the right file extension.The package (and API) supports converting PDFs to `.csv`, `.xml`, and `.xlsx`.
Note that the conversion sometimes fails to recover the underlying data exactly,
so you may have to open the retrieved file and make some manual corrections.The `get_remaining` function shows you how many pages you have remaining.
```{r}
get_remaining()
```